Closed krassowski closed 4 years ago
Hi Mike, If I understand correctly sifting through the notebooks timeline is 2005-2020. Is that correct? In that case, the plot seems fine. Can you point me to the notebook where you have the latest plot? I want to see if the normalization is by n=15 or median distribution of hits per journal per year?
Yes, we have papers from 2005-2020. It is not normalized yet - I opened the issue to address this later on (maybe tonight).
The normalization does not change too much. I only retrieved the total number of indexed articles in PubMed for journals in which there were at least 3 multi-omics articles published (75% articles covered) but it still took 2 hours to download.
Count based:
Adjusted counts ("pseudo-frequency", pseudo as the denominator is not - for practical reasons - not encompassing all journals)
The uptick in 2020 is partially because of the one week delay (I have not re-run the search given its not a priority now).
In this case, using the absolute counts might be ok - what do you think?
@krassowski
The absolute count here looks fine and also okay to me. We can just term use as overall distribution of the terms past 15 years across varied journals indexed in pubmed using divergent terms to represent multi-Omics. I don’t think we have to normalize here. I like that 2020 isn’t complete yet, so there is a dip. If reviewer asks then we can always put the adjusted counts.
No point of rerun and download. We can just mention our monthly window till 2020 for clarity.
Also tried to use a yearly fraction instead to represent changes in trends but the "noise" (single articles before the year 2010) gets promoted to what seems to be major shifts, so I think it's more confusing the reader than helping:
But I think we can ignore 2002 (1 match) and 2004 (2 matches) and start in 2005 (10 matches) - the recent changes are certainly more interesting.
Yeah. This is indeed noisy and also not very clear. For me the first one without adjustment is still good to go
Ah yes. Let’s keep the window starting 2005-2020. That is better.
Just to quickly show what I am optimizing for (is space):
And this is before inclusion of the flow diagram and a fourth panel (disease)
This looks pretty good & very impressive @krassowski . It’s getting there. I really like it. 😃
Mike:
Great job indeed:
[1] I would need a "overview short caption" and a "semi-descriptive caption" for the entire Figure. If you want to have 2/3 Figures (Each figure with multiple - typically 4 panels) and fine too.
[2] Please also send me the final 4-5 sentence "methods" description for your analysis and link to Git page that will be eventually shared with publication online.
Let me know if there are any more questions and we will be done with these.
Thanks a lot, Biswa
On Sun, Aug 2, 2020 at 10:42 AM ivivek87 notifications@github.com wrote:
This looks pretty good & very impressive @krassowski https://github.com/krassowski . It’s getting there. I really like it. 😃
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/krassowski/multi-omics-state-of-the-art/issues/12#issuecomment-667628792, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGUCRIH6D7NILZODBGG4JEDR6TYSLANCNFSM4PFE6RBA .
Done in ef1465d4a6c1f6d1dc054200d6d858c22c7feb35
To avoid the impression that the field raises faster than it does ;)