Closed alexvpickering closed 5 years ago
Relevant data that the Broad Repurposing Hub (BRH) data includes:
I'm considering pulling Pubchem records.
For some records, Pubchem has virtually everything you might want to know e.g. metformin which includes:
For L1000 I have ~20,000 unique Pubchem CIDs. BRH data is available for 1,673 of these (8%).
For CMAP02 I have 1,289 unique Pubchem CIDs. BRH data is available for 880 of these (68%).
What do you think samuelfinlayson? It would be more involved obviously but would probably allow us to construct a more complete dataset for immediate viewing.
One alternative could be to show BRH data and just provide a link out to Pubchem.
Either way it will probably be a bit of a chore to select the most-likely-to-be-safe compounds. Unless there is something easy like GRAS compounds => go for it?
➤ Samuel Finlayson commented:
Thanks for writing this. I've played around with this a bit before as well, and this rabbit hole is extremely deep, so this should definitely be subdivided into data sources that themselves are triaged into now, near-term, and long-term priorities.
That said, I think it would be nice to be able to have:
Longer term features:
[1] 370 sounds a little low but not crazy low to me for the length of GRAS list. I thought I read ~500 on wikipedia recently.
samuelfinlayson Yeah the exact numbers are a bit fuzzy. Over 370 is GRAS compounds that the FDA has evaluated ( https://www.fda.gov/food/generally-recognized-safe-gras/gras-substances-scogs-database ) and made some conclusions about their scientific evidence for safety.
~870 compounds have been submitted ( https://www.accessdata.fda.gov/scripts/fdcc/?set=GRASNotices ) and the FDA generally responds with We have no questions about this submission for the intended use but you have to make sure things are safe/
➤ Samuel Finlayson commented:
alexvpickering Gotcha, yeah those numbers are right in line with what I was expecting. I think for now we should treat both of those as GRAS for our purposes. Basically, what I want is a litmus test a clinician can use for: "is this a totally random compound that could do anything, or do we have a reasonable prior that it's safe." By the same token, anything submitted to or approved on the GRAS list has a decent shot at being available for order, etc. so there is a practically benefit there as well.
samuelfinlayson I do happen to have RDKIT morgan fingerprints for all the LINCS/CMAP compounds with smiles and some python scripts for chemfp tanimoto/tversky similarity searches that would make going in that direction relatively straightforward.
➤ Samuel Finlayson commented:
alexvpickering Cool, I'd still consider it on the lower side priority wise but I think it's worth thinking about, and glad to know it wouldn't be too bad
➤ Samuel Finlayson commented:
Is there specific “specification” that is needed, or is this just moved bc we’re still collectively figuring it out?
Yep exactly. I've added the BRH data (which includes clinical status) so just adding this back here for when we decide to expand things.
Started with Broad Repurposing Hub Data (Thanks samuelfinlayson!)
Near term:
Longer term: