krassowski / multi-omics-state-of-the-field

Analyses for "State of the field in multi-omics research: from computational needs to data mining and sharing"
https://doi.org/10.3389/fgene.2020.610798
MIT License
24 stars 13 forks source link

Explore application of BioNLP text mining API from NCBI #1

Open krassowski opened 4 years ago

krassowski commented 4 years ago

There is an API which allows to easily retrieve pre-calculated tags of PubMed papers, with extracted bioconcepts including gene, chemical, disease, mutation and species:

Screenshot from 2020-06-15 12-49-00

This can be very useful (along with MeSH headers) to answer questions like:

Link: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/

image

biswapriyamisra commented 4 years ago

Mike: Looks like a great tool and exercise!!! Sure, we are getting into BEL/ text mining I guess ?!?

Fine as long as we answer the first 2 questions or alike:

[1] do we have a problem with tools being developed on cancer data only? - or other diseases, vs plants vs bacteria vs environmental ? [2] are we developing methods using human data only? - vs mice, nematode, bacteria, E. coli, rats look fine too!!!

[3] I am not sure with this 3rd question: "are any genes over represented in multi-omics research (like master regulators microRNAs/TP53)": given handful of total "multiomics" studies, we might accumulate a lot of noisy results, simply because "Cancer is over represented" in these studies etc. And also problem is, "they are over represented true, just because they are "mentioned" in the "text" or actually "found as a top hit by FC/ P-value" or "just cited as as an evidence/ background/ knowledge"! Its going to be painful to curate that. We (essentially your precious time and skills!) may not invest that much time in generating a figure again with challenges.

[4] Can we add a question that needs to be answered (As I could think of now!): Question: How many papers used the combination of following multiomics ? -genomics + transcriptomics + proteomics + metabolomics + microbiome + imaging + SNP + epigenetics + variations + etc. + etc.. -genomics + transcriptomics + proteomics + metabolomics + microbiome + imaging + SNP + epigenetics
--genomics + transcriptomics + proteomics + metabolomics + microbiome + imaging + SNP -genomics + transcriptomics + proteomics + metabolomics + microbiome + imaging
-genomics + transcriptomics + proteomics + metabolomics + microbiome -genomics + transcriptomics + proteomics + metabolomics -genomics + transcriptomics + proteomics -genomics + transcriptomics (does not qualify as 'dual omics').
-AND COMBINATIONS of all the above such as - genomics + metabolomics + microbiome etc..... (let me know if it makes sense!)

[5] Will think of more questions and come back to you for more "figures"!

Note: Keep generating the Figures and the ones that will remain unused here, we can use them up for our Paper 2 (mini review, for the one that Vivek got the invitation for!!)

Thanks a lot for this exercise!

Best, Biswa

krassowski commented 4 years ago

Great point [4] on looking for combinations of different omics/data types!

Re [3] - yes, I am aware that it will not be trivial to interpret due to over-representation of microRNA/cancer studies in the first place, but might be fun either way. There was a study showing that majority of the studies focuses on minority of the genes and this is a lot of like "fashion/yearly trend". Just curious if we can show something similar in multi-omics field.