Open agitter opened 4 years ago
This is incredible! @hufengling and @SiminaB might also be interested conceptually, and I know @nilswellhausen is interested in helping with visualizations.
Sounds like a good idea. Just had a quick look. I think interesting points to illustrate would be the number of trials in any given phase, number of trials for any given study type and maybe a list of the top 10 drugs with the most number of trial. We could then emphasize the point that ressources are not allocated very well (like why do we need 30+ trials for drug x). Although it would be interesting to see which country has the largest amount of trials, I am somewhat hesitant about bringing demographics into this because it could make the impression that country x is better than countries y,z etc.
This project's figures give a good preview of what the data look like. @nilswellhausen to your point, we can see there are over 250 trials about hydroxychloroquine.
I'll work on ingesting the data, creating a pandas dataframe with it, and making a proof of concept table or figure. Once we have the data accessible we can try out more visualizations.
Looks great! They already include clinicaltrials.gov, which seems to provide most of data, as well as some other registries. One thing to keep in mind is that a "trial" that's registered in, say, clinicaltrials.gov, is not always what we think of as a "randomized trial" - phase I and II trials are not usually randomized and some of the studies in fact look at symptoms and don't have an intervention (eg "Analysis of clinical characteristics of severe novel coronavirus pneumonia (COVID-19)") - they do have an "intervention" column with some details so that's something we should definitely consider using.
After some help in https://github.com/ebmdatalab/covid_trials_tracker-covid/issues/18, I now have access to a dataframe with the COVID-19 TrialsTracker data. Their figure generation code is also available so I can confirm I'm using the data frame correctly.
What should I include in the first proof of concept figure? I'd like to test it with something easy, like trials by phase, study type, or study category.
This dataset also contains a DOI URL when the results have been published or pre-printed. We can think about whether we'd like to extract those to reference the manuscripts in our review. Maybe as another automatically-generated appendix? Currently 88 of 3733 trials cross-reference a manuscript.
Trials by study_type, phase, and recruitment status should be good, I think. Recruitment status tells you if they're recruiting, about to recruit, have results etc: https://clinicaltrials.gov/ct2/help/glossary/recruitment-status. So it's a snapshot of where the trials are at, plus you can get an idea of how many studies are terminated or withdrawn.
This sounds great. It could also be interesting to see which clinical trials are cited in our paper and/or reviewed by Mt. Sinai. If that's something that we decide to do I could help out with getting the two data sources synced up.
@SiminaB I was able to add those in the figure in #465. How does it look?
@rdvelazquez I extracted the Manubot citekeys and stored them as a list in the JSON file. As a first pass, we could cite them all in the manuscript and see how many appear elsewhere in the AppVeyor preview. Then, we could see whether we want to do something more sophisticated.
@agitter We actually have all the citation info from our review paper in .tsv format... here's a quick example of getting that info to use with cross referencing against the clinical trials data if you want to go that route: https://github.com/rdvelazquez/covid19-review/blob/external-resources/ebmdatalab/Clinical%20Trials%20Cited%20in%20Paper.ipynb
Evaluating our coverage of the available clinical trials might be beyond the scope of what we want to get into right now but I just wanted to share in case this was something that people were interested in.
That's great @rdvelazquez. Let's leave this issue open so we can continue discussing clinical trials figures and what to do with the references after #465 is merged.
That extract_citekey
function may also be helpful for some of our other citation cross-referencing.
As mentioned in https://github.com/greenelab/covid19-review/pull/465 it would be great to get the trial location. I see a "countries" tab, so that would probably be enough. This could allow us to see if most of the Traditional Medicine trials are in China or China and Iran etc.
@dziakj1 brought up this graphic in the chat
I saw an interesting graphic today on clinical trials so far, but it wasn't very pleasant: https://www.statnews.com/2020/07/06/data-show-panic-and-disorganization-dominate-the-study-of-covid-19-drugs/
We also have access to enrollment data and could include that in our figures.
The latest automatic data update (97b6d6dc405ee5e60c7329bf7a6c6195532f1cb5) added new trial phases and a "Not Specified" entry. I'll need to investigate those values.
I think this issue can be closed -- I just wanted to confirm with @agitter that there isn't more in progress!
There are several projects and databases that aggregate and clean COVID-19 clinical trials data. We could add an auto-updating statistic, figure, and/or table reporting information about clinical trials.
Of those I've reviewed, http://covid19.trialstracker.net/ is appealing. The dataset is available in a clean csv file, the code is open source at https://github.com/ebmdatalab/covid_trials_tracker-covid, and they are very open to data reuse http://covid19.trialstracker.net/about/.
Does this seem like a good addition?