hms-dbmi / dseqr

single-cell and bulk RNA-seq analyses from counts → pathways → drug candidates.
https://docs.dseqr.com
Other
20 stars 4 forks source link

expand ancillary info #4

Closed alexvpickering closed 5 years ago

alexvpickering commented 5 years ago

Started with Broad Repurposing Hub Data (Thanks samuelfinlayson!)

Near term:

Longer term:

alexvpickering commented 5 years ago

Relevant data that the Broad Repurposing Hub (BRH) data includes:

I'm considering pulling Pubchem records.

For some records, Pubchem has virtually everything you might want to know e.g. metformin which includes:

For L1000 I have ~20,000 unique Pubchem CIDs. BRH data is available for 1,673 of these (8%).

For CMAP02 I have 1,289 unique Pubchem CIDs. BRH data is available for 880 of these (68%).

What do you think samuelfinlayson? It would be more involved obviously but would probably allow us to construct a more complete dataset for immediate viewing.

One alternative could be to show BRH data and just provide a link out to Pubchem.

Either way it will probably be a bit of a chore to select the most-likely-to-be-safe compounds. Unless there is something easy like GRAS compounds => go for it?

alexvpickering commented 5 years ago

➤ Samuel Finlayson commented:

Thanks for writing this. I've played around with this a bit before as well, and this rabbit hole is extremely deep, so this should definitely be subdivided into data sources that themselves are triaged into now, near-term, and long-term priorities.

That said, I think it would be nice to be able to have:

Longer term features:

[1] 370 sounds a little low but not crazy low to me for the length of GRAS list. I thought I read ~500 on wikipedia recently.

alexvpickering commented 5 years ago

samuelfinlayson Yeah the exact numbers are a bit fuzzy. Over 370 is GRAS compounds that the FDA has evaluated ( https://www.fda.gov/food/generally-recognized-safe-gras/gras-substances-scogs-database ) and made some conclusions about their scientific evidence for safety.

~870 compounds have been submitted ( https://www.accessdata.fda.gov/scripts/fdcc/?set=GRASNotices ) and the FDA generally responds with We have no questions about this submission for the intended use but you have to make sure things are safe/

alexvpickering commented 5 years ago

➤ Samuel Finlayson commented:

alexvpickering Gotcha, yeah those numbers are right in line with what I was expecting. I think for now we should treat both of those as GRAS for our purposes. Basically, what I want is a litmus test a clinician can use for: "is this a totally random compound that could do anything, or do we have a reasonable prior that it's safe." By the same token, anything submitted to or approved on the GRAS list has a decent shot at being available for order, etc. so there is a practically benefit there as well.

alexvpickering commented 5 years ago

samuelfinlayson I do happen to have RDKIT morgan fingerprints for all the LINCS/CMAP compounds with smiles and some python scripts for chemfp tanimoto/tversky similarity searches that would make going in that direction relatively straightforward.

alexvpickering commented 5 years ago

➤ Samuel Finlayson commented:

alexvpickering Cool, I'd still consider it on the lower side priority wise but I think it's worth thinking about, and glad to know it wouldn't be too bad

alexvpickering commented 5 years ago

➤ Samuel Finlayson commented:

Is there specific “specification” that is needed, or is this just moved bc we’re still collectively figuring it out?

alexvpickering commented 5 years ago

Yep exactly. I've added the BRH data (which includes clinical status) so just adding this back here for when we decide to expand things.