sr320 / LabDocs

Roberts Lab Documents
http://sr320.github.io/LabDocs/
9 stars 17 forks source link

efficient way to sort through GOterms? #660

Closed yaaminiv closed 6 years ago

yaaminiv commented 6 years ago

I have a list of proteins, p-values and GOterms. The proteins were those I used for my SRM targets and p-values are related to level of differential expression between site and eelgrass conditions. I want to make a REVIGO plot for the biological processes these proteins are involved in.

Each protein has many GOterms associated with it. Is there an easy way to sort through these GOterms and isolate those related to biological processes? I tried just inputting the first GOterm listed for each protein in REVIGO and found that only two were related to biological processes and the rest were for molecular function.

biologicalprocesses

molfunction

My method for getting GOterms and making these plots can be found at the bottom of this notebook. Thanks!

sr320 commented 6 years ago

Note your p-values are not associated with GO terms but rather proteins...

In order to best address- what is the overall goal?

Maybe represent what GO terms are represented in your target protein list?

If so, I would have 1 column with GO# and second number with number of occurrences in your list.

yaaminiv commented 6 years ago

My overall goal is to have some form of REVIGO visualization that relates the biological processes of the proteins I used for my SRM assay with their differential expression. Having one column with GOterms and one column with no. occurrences could also be a good visualization, but maybe not in REVIGO, as it splits GOterms between biological processes and molecular functions. Ideally, I just want one visualization with all information.

sr320 commented 6 years ago

So there will not be a single differential expression value for each GO term? So revigo is probably not best approach.

Closest thing would be to have an average differential expression value per GO term.

sr320 commented 6 years ago

Also note, it would not be appropriate to just pick first GO # (this would introduce bias).

yaaminiv commented 6 years ago

I just picked the first GO term to test if all of the GOterms I had were for biological processes or for something else.

If GOterms are repeated between proteins (which I'm sure they are), then there won't be a single p-value for each GOterm. I could average the p-values for each occurrence and do something with that inside or outside REVIGO?

sr320 commented 6 years ago

The p-value has no direct relation to the GOterm, so you should not average.

Bigger reason why you are trying to generate this particular figure?

yaaminiv commented 6 years ago

I just wanted to visualize the proteins I selected for SRM. I have a heatmap but thought I could try something else.

sr320 commented 6 years ago

spawning plan :)