ncats / stitcher

A stitching platform for InXight
https://stitcher.ncats.io/api/stitches/latest
Other
16 stars 8 forks source link

Include clinicaltrials.gov mappings from FDA #155

Closed southalln closed 3 years ago

southalln commented 3 years ago

This has been an ongoing process for probably the last 5+ years. I am not sure exactly when Larry started.

Process is that every month or more recently every two weeks GSRS downloads the interventions from ct.gov with no date or intervention type constraints. One intervention = 1 line of text in the study.

I compare the text of the interventions to the previous download and then try to match any interventions that were not in the previous download.

I also keep a list of all interventions that we have matched or omitted (discarded as not relevant); this way I can periodically go back and get a list of things we haven’t yet matched and try to match them again.

Also in practice, me and other people who help with this do a good deal of looking at trials visually/manually when we can’t match interventions automatically.

I tend to only consider a match “done” when all substances in an intervention have been matched … so partial matches might not quickly find their way into the database as a match. On the other hand, people can continue to match data in the GSRS and add substances there and the extra matches would be in the dataset that gets to you.

Some weaknesses in the data is that CT.gov is continually changing the data and drugs ARE dropped from studies routinely so there can be mappings that were not finally implemented in the study. This was evident recently when everyone seemed drop Hydroxychloroquine from their covid studies. Also I focus most on interventions, so substances that are noted in the title or text might not be there. We are working to improve that.

Finally, the “public unii” data filter is from December 2020 so that would eliminate some more recent mappings. There will be a new public unii data release soon.

southalln commented 3 years ago

Update to include new mappings from FDA https://github.com/ncats/stitcher/pull/157/commits/18f52fe54b37d6c2c01e99ec7d35ef0d828599e5