This project was started in Autumn Session 2019 by a research group at National Institute of Technology Rourkela
under the guidance of Prof. B.P. Nayak(the same group that runs purplepotion) of Regenerative and Rehabilitative Medicine Laboratory, Department of
Biomedical Engineering. The motivation behind this research is to improve the performance of
the state-of-the art ML and DL models to detect Adverse Drug Reaction Signals(ADRS) from
social media platforms like Twitter using the concept of Data Programming .
How it might look like:
Sample Tweet:
Tweet: "Having Headaches since morning! I need to get off of crocin!"
Sample Result:
ADR Probability: 88%
Detected Drug: Crocin
Detected ADR: Headache
This was quite an easy one to guess. But in real scenarios, sentence structures can be very complicated.
One of the major improvements and novelty we want to bring in the pipeline of performing such tasks
is to remove the human involvement in preparing a labelled dataset in order to perform supervised learning,
instead, we want to make the process programming weakly supervised strategies and at the same time reducing the noise generated
during the process to a minimum. The above classification might seem a simple enough task for a
modern classifier. But, the bigger challenge that we are attempting to solve here are the following:
Each and every person reading this is a potential contributor for us. We have contribution opportunities for everyone irrespective of knowledge and experience. You can contribute.
What else will I get?
Friends! We are people from diverse backgrounds and interests. Some are even working/incoming FTEs in well known software firms. This would be a great way to know each other and contribute to a single cause!
NOTE: Every contributor is valuable to us. Hence each and every contributor irrespective of the "type" of contribution, would be mentioned in the "contributors" section of the page.
All communications regarding FOSSHACK 2020 would take place in the #fosshack2020 Slack Channel : LINK (expires in 30 days, i.e on 10th October 2020)
Author : Shaswat Lenka
Top Contributors: Debabrata Panigrahi, Ankit Samota, Vedant Raghuwanshi, Abhijeet Sahoo, Roshan Kumar Shaw
There is enough research evidence that social media can be an important source of indicating Adverse Drug Reactions and analyzing disease trends in a population. Although this signal is weak, many algorithms have been developed to extract the important signals that depict a valid ADR. However, the prediction of future disease trends from social media data in a population under study is a challenging task and breakthroughs have not been made in this direction. Also, another epidemiologic challenge that demands to be solved is predicting possible reason(s) behind the appearance of such trends for further verification and validation of the Early Warning System.
Hence, SADRAT would tackle the above-mentioned issues providing the pharmaceutical companies with the necessary parameters and predictive outcomes (i.e. Disease trends, probable reasons for disease etc. using predictive analytics) in a dashboard that would leverage their decision-making (related to marketing strategies, venturing into new market and the introduction of new and upgraded drugs) while targeting a particular population based on pivots such as season, age, gender, race, etc.
To correctly instal spacy, run these command in sequence in your Virtual environment
to avoid getting runtime exceptions.
pip install spacy
python -m spacy download en
pip install textblob
python -m textblob.download_corpora
pip install jupyter
- In your virtual environment
ipython kernel install --user --name=yourvirtualenvname
- To add a jupyter kernel
To build the project in your local computer:
clone
the project, from your forked repository.cd
to the project root directory and install the dependencies: pip install requirements.txt
if you are using some IDE, make sure you have untracked all the hidden cache files in .gitignore.
We strongly recommend following a git-flow like workflow.
If this looks too complicated to you, just follow good practices and naming conventions in your branches.
Once you are done with your code in your branch, you can send a PR.
Steps:
(Asuming you are using a Linux/Unix system and already have git installed. For windows users, things might be a little different but the steps would mostly be the same.)
fork
this repository.clone
your forked repository by this command: git clone <your_ssh_key>
, you can get git@github.com:purplepotion/sadrat.git
. But before that you must have set up you ssh keys with GitHub.origin
and upstream
in of your cloned repository. Origin refers to where your code will get committed to i.e. your forked repository on GitHub. Your upstream would be this repository with which you would be keeping your cloned repository up to date. git remote add upstream git@github.com:purplepotion/sadrat.git
Origin would automatically be set to your forked repository by default.git checkout -b <your_branch_name>
. After you make changes, commit to the branch and when done, push it to the origin by doing the following steps:git pull upstream master
- Always do this before you commit as to stay updated with the upstream repository.git push origin <your_branch_name>
- this will push your branch to your origin i.e. your forked repository.