Feedback on the omics - Githubissues

krassowski commented 4 years ago

Please comment on the inclusion of the following terms in omics inclusion comparison:

*omic(s) matches

term	count
genomics	538
proteomics	538
transcriptomics	423
metabolomics	412
metagenomics	88
epigenomics	76
lipidomics	52
proteogenomics	35
phosphoproteomics	27
metaproteomics	26
pharmacogenomics	25
metatranscriptomics	23
phenomics	19
glycomics	14
~~post-genomics~~	14
radiomics	13
meta-omics	12
fluxomics	11
metabonomics	11
peptidomics	10
~~postgenomics~~	10
methylomics	9
~~agronomics~~	7
microbiomics	6

*ome matches

term	count
genome	437
transcriptome	264
proteome	195
microbiome	169
metabolome	145
whole-genome	47
exome	42
epigenome	26
methylome	26
whole-exome	22
interactome	17
metagenome	15
phosphoproteome	15
secretome	12
lipidome	10
translatome	7
whole-transcriptome	7
metatranscriptome	6
~~peroxisome~~	6
phenome	6
~~proteasome~~	6

for details see Omics.ipynb.

Note: for now using abstracts only, we may have another iteration for full-text search.

krassowski commented 4 years ago

Also, I'm thinking to group some of the terms together for some plots:

ome_groups = {
    'DNA': {
        'genome',
        'whote-genome',
        'exome',
        'whole-exome'
    },
    'RNA': {
        'transcriptome',
        'whole-transcriptome'
    },
    'protein': {
        'proteome',
        # ?
        'phosphoproteome'
    },
    'epi': {
        'epigenome',
        'methylome'
    },
    'microbial': {
        'metagenome',
        'metatranscriptome'
    }
}

biswapriyamisra commented 4 years ago

Mike,

Good job really!!!

From the " *omic(s) matches" please remove the unrelated term : "agronomics"; also " transomics" is same as multiomics/ polyomics/integrated omics, and "postgenomics " is a timeline and not an omics branch as such.

In the "term" list remove "peroxisome"

In the code for "ome_groups" there is a typos 'whote-genome', -> whole-genome; and try to b inclusive of more terms above in the term list!

Thanks,

On Thu, Jul 23, 2020 at 2:44 AM Michał Krassowski notifications@github.com wrote:

Also, I'm thinking to group some of the terms together for some plots:

ome_groups = { 'DNA': { 'genome', 'whote-genome', 'exome', 'whole-exome' }, 'RNA': { 'transcriptome', 'whole-transcriptome' }, 'protein': { 'proteome',

?
    'phosphoproteome'
},
'epi': {
    'epigenome',
    'methylome'
},
'microbial': {
    'metagenome',
    'metatranscriptome'
}
}

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/krassowski/multi-omics-state-of-the-art/issues/10#issuecomment-662700291, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGUCRIAKTIYXBJSOMIDE2GLR45JFHANCNFSM4PFC4CWA .

krassowski commented 4 years ago

All good catches! I removed the organelles, transomics and postgenomics.

I also improved my regular expressions and now match 50% more terms (see notebook). Added terms which now make the cut to the "at least in five papers" criterion:

term	count
exposome	6
miRNome	6
host-microbiome	5
metaproteome	5

and:

term	count
nutrigenomics	8
glycoproteomics	5

biswapriyamisra commented 4 years ago

Good! But change "host-microbiome" to only "microbiome" as the later is the field, and the first one is uncommon- or an interaction sort of term.

On Thu, Jul 23, 2020 at 8:04 AM Michał Krassowski notifications@github.com wrote:

All good catches! I removed the organelles, transomics and postgenomics.

I also improved my regular expressions and now match 50% more terms (see notebook). Added terms which now make the cut to the "at least in five papers" criterion: term count exposome 6 miRNome 6 host-microbiome 5 metaproteome 5

and: term count nutrigenomics 8 glycoproteomics 5

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/krassowski/multi-omics-state-of-the-art/issues/10#issuecomment-662790862, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGUCRIB2LNTYDLNBMFOOCHLR46OVFANCNFSM4PFC4CWA .

vd4mmind commented 4 years ago

Also, I'm thinking to group some of the terms together for some plots:

ome_groups = {
    'DNA': {
        'genome',
        'whote-genome',
        'exome',
        'whole-exome'
    },
    'RNA': {
        'transcriptome',
        'whole-transcriptome'
    },
    'protein': {
        'proteome',
        # ?
        'phosphoproteome'
    },
    'epi': {
        'epigenome',
        'methylome'
    },
    'microbial': {
        'metagenome',
        'metatranscriptome'
    }
}

@krassowski I agree with Biswa here. It should be only "microbiome". If you remove the typos and the ones Biswa pointed out I like the strategy in this code. Is there any other *ome under epi that we can think? I expect histones will be still covered.

So my query here is as below if you can kindly address:

Doing the above exercise what are we going to address? I know it is a naive question but I guess I am bit lost here. You have done a tremendous amount of work.
Now let us try to think what queries they fit into for this current review manuscript and what we can totally put into a second one where it is a systematic meta-analysis that can be a continuation of the review. I see this a part of the systematic meta-analysis you are performing but can you outline the queries you want to address?

biswapriyamisra commented 4 years ago

Please keep in mind the manuscript is already over-flowing with surplus information/ content!

You are all very welcome to generate more figures- but remember, eachf figure/ panel will take 2-3 lines to explain/ describe, plus, the "methods" have to be captured in few lines in the MS. Just as a info we are already 6000-7000 words and thats excess by 1000-2000 typical words.

But then if you want to have a "stand alone manuscript" only on "Trends in MultiOmics" with data mining like Mike s doing then the "surplus exercise to save the figures, analysis" for that exercise, never a waste, but lets wrap up the "MS writing" first and clean ups for everyone, as we finalize these figures. : )

On Thu, Jul 23, 2020 at 10:35 PM ivivek87 notifications@github.com wrote:

Also, I'm thinking to group some of the terms together for some plots:

ome_groups = { 'DNA': { 'genome', 'whote-genome', 'exome', 'whole-exome' }, 'RNA': { 'transcriptome', 'whole-transcriptome' }, 'protein': { 'proteome',

?
    'phosphoproteome'
},
'epi': {
    'epigenome',
    'methylome'
},
'microbial': {
    'metagenome',
    'metatranscriptome'
}
}

@krassowski https://github.com/krassowski I agree with Biswa here. It should be only "microbiome". If you remove the typos and the ones Biswa pointed out I like the strategy in this code. Is there any other *ome under epi that we can think? I expect histones will be still covered.

So my query here is as below if you can kindly address:

Doing the above exercise what are we going to address? I know it is a naive question but I guess I am bit lost here. You have done a tremendous amount of work.

Now let us try to think what queries they fit into for this current review manuscript and what we can totally put into a second one where it is a systematic meta-analysis that can be a continuation of the review. I see this a part of the systematic meta-analysis you are performing but can you outline the queries you want to address?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/krassowski/multi-omics-state-of-the-art/issues/10#issuecomment-663123081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGUCRIGSPUDJORF2QUG64YLR5BUWRANCNFSM4PFC4CWA .

vd4mmind commented 4 years ago

@biswapriyamisra , hence my query. I agree as we need to wrap up the current MS with the issues open in comments in our Google Doc.

This is actually a lot of amazing work for another systematic meta-analysis manuscript. Let @krassowski finish his work. I would advise if he can just try to

make 3-4 slides for Saturday to brief out the tasks he carried out. (query + hypothesis)
Results he obtained in forms of plots.
What interpretations 1 and 2 can provide us that can be directly used now and rest for the stand-alone MS.

I believe the above 3 can to an extent help us to understand if this amazing amount of work can contribute to a table/Box/Figure (only 1 may be multi-panel) for the current review MS and the rest we use for the next MS as a stand-alone one like Biswa is suggesting.

krassowski commented 4 years ago

To clear up my aims on this omics notebook:

make a plot of multi-omics term × number of omics mentions (is multi-omics usually more than 3 omics? how many on average? is omics integration 2 omics) - so basically validating our understanding of the terms with data.
extract the terms as features for my own exercises on:
- splitting the literature into "methods", "applications" and "reviews"
- clustering similar papers together (I have some early results but not worth publication yet)
- neither of those needs to go into our manuscript
having the omics used statistics that @biswapriyamisra asked in another thread

I see that we are all goal-oriented and understand that our priorities and timelines may be slightly different. While the manuscript that we started working on is the immediate goal, I write the scripts in here with future works in mind too.

vd4mmind commented 4 years ago

@krassowski “ make a plot of multi-omics term × number of omics mentions (is multi-omics usually more than 3 omics? how many on average? is omics integration 2 omics) - so basically validating our understanding of the terms with data.”

If you have already a plot on this then this can already be a mention in the manuscript & a sub-plot in any of the multi-panel figure. Rest of the points and meta-analysis as you said is done keeping in mind for future work. I think this already makes a point for me what we can directly use for the current MS and what for the future ones as @biswapriyamisra hinted. You have done amazing job. Let’s now structure it & extract the one that can directly feed into the current review MS. Plus the rest we keep on working as per your time, everyone’s availability and the next paper so that we can get on that as soon as this one is under review. Really very impressive work.

krassowski commented 4 years ago

@vd4mmind I just updated Omics.ipynb notebook, you may want to check the plots in there. These are NOT publication-quality, as this was the first pass only. If I am able to convince myself that we can reliably detect omics used in the papers then I will polish the figures. Whether we will include those figures in the manuscript or not is a different question.

krassowski commented 4 years ago

Two major points:

these exercises and figures would be only worth something if I can separate "applications of methods to a new experiment", "methods", and "reviews" papers as those have very different characteristics.
what you see in the notebook at the moment is using abstracts only; a better analysis would use the full text (which means looking at a subset of papers only); I will do so soon.

You could say that up until now this notebook was just a "test trial" - the most important result so far is 68% - from as many papers I was able to extract an omic term. This means it is worth trying to make it into a publication-quality analysis.

vd4mmind commented 4 years ago

HI @krassowski , in the Omics.ipynb I see there is a term metabonomics. It is in the output of line out [41]: under section "Merge -ome and -omic terms" and also in the output of that code snippets downstream of that along with the tables and also the Upset plots. I see quite a few times the mention of that string. Is it a typo or this is a typo existing in the publication?

krassowski commented 4 years ago

It is not a typo, it is a term with a wider in meaning than metabolomics from the "Nicholson school", see wiki. I may be biased to use this term due to my exposure at Imperial, but as far as I am concerned it has a well established usage.

vd4mmind commented 4 years ago

Case closed then. Good to know. I wasn’t aware of it. Definitely a good find then.

krassowski commented 4 years ago

I updated the omics terms extraction: Omics.ipynb to use abstract + title + subjects ("keywords) + full text (if available).

It means quite a few more ome, omics (and typos of those), for your convenience below are the visual diffs:

Screenshot from 2020-07-24 16-52-07

For details on my analytic decisions (what is typo/omic and what is not) please see the notebook.

biswapriyamisra commented 4 years ago

Great job indeed, Mike!!! Looks like a big task and exercise! ; )

Now please think of the best ways to help visualize/ present this data as a figure and we are done then !!

Next on to the manuscript for us!!

Excited in deed,

Thanks a lot, Biswa

On Fri, 24 Jul, 2020, 21:27 Michał Krassowski, notifications@github.com wrote:

I updated the omics terms extraction: Omics.ipynb https://github.com/krassowski/multi-omics-state-of-the-art/blob/master/Omics.ipynb to use abstract + title + subjects ("keywords) + full text (if available).

It means quite a few more ome, omics (and typos of those), for your convenience below are the visual diffs:

[image: Screenshot from 2020-07-24 16-52-07] https://user-images.githubusercontent.com/5832902/88410506-62a29280-cdce-11ea-968b-d2473f20fa2f.png [image: Screenshot from 2020-07-24 16-51-45] https://user-images.githubusercontent.com/5832902/88410510-63d3bf80-cdce-11ea-911d-f966bdeba5c8.png [image: Screenshot from 2020-07-24 16-53-04] https://user-images.githubusercontent.com/5832902/88410497-5fa7a200-cdce-11ea-8a8c-f1f8d4d9124a.png [image: Screenshot from 2020-07-24 16-53-41] https://user-images.githubusercontent.com/5832902/88410490-5d454800-cdce-11ea-812a-4a18235fc1f0.png

For details on my analytic decisions (what is typo/omic and what is not) please see the notebook.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/krassowski/multi-omics-state-of-the-art/issues/10#issuecomment-663608361, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGUCRIC5GF3VW7XYIERT4JTR5GVMZANCNFSM4PFC4CWA .

vd4mmind commented 4 years ago

I updated the omics terms extraction: Omics.ipynb to use abstract + title + subjects ("keywords) + full text (if available).

It means quite a few more ome, omics (and typos of those), for your convenience below are the visual diffs:

For details on my analytic decisions (what is typo/omic and what is not) please see the notebook.

This is wonderful @krassowski . I can already see that your systematic meta-analysis with similarity matches is performed already if I understood the code flow correctly. Now we have to think of a multiple panel figure from this analysis that can be used as a comprehensive figure for the MS. Very impressive.

As Biswa, suggested, try to present this to us on Saturday ~4-5 slides and let's finalize it then after discussion. For manuscript my suggestion will be a panel of figure:

Flow diagram of the systematic meta-analysis.
Upset plots showing the final outcomes for the categories that analysis tells us are multi-omics having > 2 omics layers.

Others feel free to add what you would think to make the figure more enriching.

This should help us to include it in the MS and also accordingly restructure the text section.

Kind regards, Vivek

biswapriyamisra commented 4 years ago

Hi Mike,

In deed, agree with Vivek! On Saturday we can take a final call on Figure(s) panel(s) from this output. The more the better for us to be able to "choose from"!

Few thoughts: Though 'Upsets' are informative, but for these kind of breakdown of numbers/ flow a "Sankey diagram(s)" and / "Voronoi diagram" are nicer and esp. appealing to eyes/ readers/ informative.

Also, the ' Flow diagram of the systematic meta-analysis' may surely qualify as supplementary figure 1 (are not appealing enough) as remember, ours will anyways be a review (and not pure meta-analysis except for mining -omics terms), so the background work will remain better as backgrounds!

Thanks, Biswa

On Fri, Jul 24, 2020 at 11:30 PM ivivek87 notifications@github.com wrote:

I updated the omics terms extraction: Omics.ipynb https://github.com/krassowski/multi-omics-state-of-the-art/blob/master/Omics.ipynb to use abstract + title + subjects ("keywords) + full text (if available).

It means quite a few more ome, omics (and typos of those), for your convenience below are the visual diffs:

[image: Screenshot from 2020-07-24 16-52-07] https://user-images.githubusercontent.com/5832902/88410506-62a29280-cdce-11ea-968b-d2473f20fa2f.png [image: Screenshot from 2020-07-24 16-51-45] https://user-images.githubusercontent.com/5832902/88410510-63d3bf80-cdce-11ea-911d-f966bdeba5c8.png [image: Screenshot from 2020-07-24 16-53-04] https://user-images.githubusercontent.com/5832902/88410497-5fa7a200-cdce-11ea-8a8c-f1f8d4d9124a.png [image: Screenshot from 2020-07-24 16-53-41] https://user-images.githubusercontent.com/5832902/88410490-5d454800-cdce-11ea-812a-4a18235fc1f0.png

For details on my analytic decisions (what is typo/omic and what is not) please see the notebook.

This is wonderful @krassowski https://github.com/krassowski . I can already see that your systematic meta-analysis with similarity matches is performed already if I understood the code flow correctly. Now we have to think of a multiple panel figure from this analysis that can be used as a comprehensive figure for the MS. Very impressive.

As Biswa, suggested, try to present this to us on Saturday ~4-5 slides and let's finalize it then after discussion. For manuscript my suggestion will be a panel of figure:

Flow diagram of the systematic meta-analysis.

Upset plots showing the final outcomes for the categories that analysis tells us are multi-omics having > 2 omics layers.

Others feel free to add what you would think to make the figure more enriching.

This should help us to include it in the MS and also accordingly restructure the text section.

Kind regards, Vivek

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/krassowski/multi-omics-state-of-the-art/issues/10#issuecomment-663661703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGUCRICOVQJP4HNGI7SF3QDR5HD27ANCNFSM4PFC4CWA .

vd4mmind commented 4 years ago

Thanks, Biswa.

I agree partly. @krassowski let's see if you can make both Shankey and Upset for the last two plots in that notebook. I will support for Upset as it needs support from the community and is free-lancing. So needs more citation. Shankey is great but often the overlay and color codes will be challenging given we have so many categories. But if Mike can produce both we can evaluate. Keep in mind the underlying data structure might not be the same for a Shankey like this as its multi-category, multi-term, and multi overlaps/interactions. But nonetheless, if Mike can generate by Saturday, it is worth the discussion at our end. Color code will still be challenging. The good thing about Upset is unicolor (black) and this saves us a lot of hassle. Speaknig from my experience of generating Shankey. I have no experience in generating Voronoi using code but used it for imaging analysis so cannot speak of it much of it.

Kind regards, Vivek

biswapriyamisra commented 4 years ago

Thanks, if free/ OS/ code-amenable then great, if not, then still tools like SankeyMATIC:http://sankeymatic.com/ and Voronoi: http://alexbeutel.com/webgl/voronoi.html will do the job for a beginner like me! Point also is to keep it colorful, attractive, with an informative part at the front- not opposed to UpsetR (having used myself!) just that a B&W is not appealing! Colors may not be a problem with a good cutoff say,

20 or > 50 etc, doesn't have to go as low as 1 omics term etc, as an example- details matter but figures mst capture the overview well!

But see if its quicker/ doable and not much hard work then DO, or else ignore my comments- Mike, Vivek! Thoughts or solutions Sangram?

Keep the ball rolling ! : )

On Sat, Jul 25, 2020 at 12:40 AM ivivek87 notifications@github.com wrote:

Thanks, Biswa.

I agree partly. @krassowski https://github.com/krassowski let's see if you can make both Shankey and Upset for the last two plots in that notebook. I will support for Upset as it needs support from the community and is free-lancing. So needs more citation. Shankey is great but often the overlay and color codes will be challenging given we have so many categories. But if Mike can produce both we can evaluate. Keep in mind the underlying data structure might not be the same for a Shankey like this as its multi-category, multi-term, and multi overlaps/interactions. But nonetheless, if Mike can generate by Saturday, it is worth the discussion at our end. Color code will still be challenging. The good thing about Upset is unicolor (black) and this saves us a lot of hassle. Speaknig from my experience of generating Shankey. I have no experience in generating Voronoi using code but used it for imaging analysis so cannot speak of it much of it.

Kind regards, Vivek

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/krassowski/multi-omics-state-of-the-art/issues/10#issuecomment-663690321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGUCRICZCBCZTWEUJSK7TFLR5HMDVANCNFSM4PFC4CWA .

krassowski commented 4 years ago

Agree to Sankey/improving and coloring the upset plots; all that I am showing so far is just work-in-progress - do not worry! Just might not be ready for Saturday - I am doing all I can to extract as much insight rather than to polish the visualisations ;)

krassowski / multi-omics-state-of-the-field

Feedback on the omics #10

?

?