krassowski / multi-omics-state-of-the-field

Analyses for "State of the field in multi-omics research: from computational needs to data mining and sharing"
https://doi.org/10.3389/fgene.2020.610798
MIT License
24 stars 13 forks source link

Compare the disease term frequency against "background" #11

Closed krassowski closed 3 years ago

krassowski commented 4 years ago

Based on disease-term extraction we can safely confirm that the vast majority of the research in multi-omics is done on cancer. But cancer research, in general, receives a lot of attention.

Is the research using cancer data over-represented in the multi-omics?

Proposed method: collect articles from the same time period from the same journals; use a permutation test sub-sampling to match the number of articles per journal. Compare the frequency of disease terms.

biswapriyamisra commented 4 years ago

Thanks, Mike, my understanding is as follows:

  1. if truth is "cancer" we can of course write it out or show that in whatever graph you are making, without advertising it much.
  2. Yes, it is a no brainer that cancer data in deed is OR in multiomics.
  3. " Proposed method: collect articles from the same time period from the same journals; use a permutation test sub-sampling to match the number of articles per journal. Compare the frequency of disease terms." Idealy yes, thats how it should be normalized possibly, but finding a non-cancer biased journal is tough in first place! : ) I do not think we have time for these more finer analysis, and rather invest little left time on writing up the manuscript and cleaning up the existent data/ figures/ exercise just in the best interest of your time! ; )

On Thu, Jul 23, 2020 at 3:55 AM Michał Krassowski notifications@github.com wrote:

Based on disease-term extraction we can safely confirm that the vast majority of the research in multi-omics is done on cancer. But cancer research, in general, receives a lot of attention.

Is the research using cancer data over-represented in the multi-omics?

Proposed method: collect articles from the same time period from the same journals; use a permutation test sub-sampling to match the number of articles per journal. Compare the frequency of disease terms.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/krassowski/multi-omics-state-of-the-art/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGUCRIAVKTV6C5VQXHJ423DR45RNZANCNFSM4PFE54VQ .

vd4mmind commented 4 years ago

Yes, this is a much broader topic @krassowski , I guess this is something we should keep for the next stand-alone manuscript. I have mentioned in one of the issues as to what you can present to us on the weekend call. Like a 3-4 slide summarization then it can help us to get more clarity as to what can be directly fed into the current MS and rest help us structure the stand-alone MS having structured deeper and finer meta-analysis. Here is the query.

krassowski commented 3 years ago

I used time-matched (years), journal-matched (journals weighted by the proportion of multi-omics works published in this journal) subset to compare cancer and went with a simpler Fisher exact test. See Diseases_and_datasets.ipynb notebook.

Other disease terms to be considered in future, but the primary goal of checking cancer term is done here.