z0on / GO_MWU

Rank-based Gene Ontology analysis of gene expression data
36 stars 17 forks source link

Why are some GO IDs (and associated genes) not included in the "main data table"? #12

Open laurahspencer opened 2 years ago

laurahspencer commented 2 years ago

I'm interested in particular GO terms in my input files, but they do not get included in the "main data table" (GO division)_(input filename). I understand that all original GO terms aren't actually analyzed because they are represented by either a) a more specific term, or b) a highly similar term. I would like to see which redundant/similar GO term absorbed the GO terms of interest, but can't find my GO term of interest in any of the GO_MWU output. Are all original GO IDs then supposed to be accounted for in the main data table?

Any insight would be very helpful. Attached is some GO_MWU code and results showing that the GO term of interest (and its associated genes) is missing from the GO_MWU output. You'll see that I have relaxed the filtering settings to not remove any GO categories that contain a large fraction of genes or only a few genes, in an effort to not throw out GO terms.

testing_gomwu.zip

z0on commented 2 years ago

Hi Laura - can you check in your GO annotations file, is there such term initially?

There are four ways an existing (ie actually among annotations) GO term might disappear:

Misha

On Jul 11, 2022, at 10:37 PM, Laura H Spencer @.***> wrote:

I'm interested in particular GO terms in my input files, but they do not get included in the "main data table" (GO division)_(input filename) that GO_MWU actually analyzes. I understand that all original GO terms aren't actually analyzed because they are represented by either a) a more specific term, or b) a highly similar term. I would like to see which redundant/similar GO term absorbed the GO terms of interest, but can't find my GO term of interest in any of the GO_MWU output. Are all original GO IDs then supposed to be accounted for in the main data table?

Any insight would be very helpful. Attached is some GO_MWU code and results showing that the GO term of interest (and its associated genes) is missing from the GO_MWU output. You'll see that I have relaxed the filtering settings to not remove any GO categories that contain a large fraction of genes or only a few genes, in an effort to not throw out GO terms.

testing_gomwu.zip https://github.com/z0on/GO_MWU/files/9087390/testing_gomwu.zip — Reply to this email directly, view it on GitHub https://github.com/z0on/GO_MWU/issues/12, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGHGD5H7JP2UZ7YVFC3VTSAZZANCNFSM53IYAHDA. You are receiving this because you are subscribed to this thread.

laurahspencer commented 2 years ago

Hi Misha,

Thanks for the info!

I've double checked that the GO term is indeed in the GO annotations file (go.obo) and is assigned to the namespace "biological_process" (see below), and that the GO term is in the background list of GO terms that I input into GO_MWU.

[Term]
id: GO:0006313
name: transposition, DNA-mediated
namespace: biological_process
alt_id: GO:0006317
alt_id: GO:0006318
def: "Any process involved in a type of transpositional recombination which occurs via a DNA intermediate." [GOC:jp, ISBN:0198506732, ISBN:1555812090]
synonym: "Class II transposition" EXACT []
synonym: "DNA transposition" EXACT [GOC:dph]
synonym: "P-element excision" NARROW []
synonym: "P-element transposition" NARROW []
synonym: "Tc1/mariner transposition" NARROW []
synonym: "Tc3 transposition" NARROW []
is_a: GO:0006310 ! DNA recombination
is_a: GO:0032196 ! transposition

I have also experimented with relaxing the smallest and largest (smallest=1, largest=.99), and set clustuerCutHeight=0, but my GO terms are still missing. To your fourth bullet I found one offspring of my GO term in the results, but it isn't associated with any of the genes that map to my GO term of interest (the genes it does map to aren't significant). Further, the genes that are associated with my GO term of interest are also missing from the output - are all significant genes supposed to be re-assigned to other GO terms, or are they supposed to be removed?

Thanks for the help!

z0on commented 2 years ago

Sorry what is the “background list of go terms”? Go_mwu does not have that…

Question is, does you favorite GO term ever appear among your genes’ annotations, I mean the genes for which you have expression measured? If yes, how many times?

(Go.obo is just the universal database of all possible GO terms, no wonder it is there)

On Thu, Jul 14, 2022 at 8:16 PM Laura H Spencer @.***> wrote:

Hi Misha,

Thanks for the info!

I've double checked that the GO term is indeed in the GO annotations file (go.obo) and is assigned to the namespace "biological_process" (see below), and that the GO term is in the background list of GO terms that I input into GO_MWU.

[Term] id: GO:0006313 name: transposition, DNA-mediated namespace: biological_process alt_id: GO:0006317 alt_id: GO:0006318 def: "Any process involved in a type of transpositional recombination which occurs via a DNA intermediate." [GOC:jp, ISBN:0198506732, ISBN:1555812090] synonym: "Class II transposition" EXACT [] synonym: "DNA transposition" EXACT [GOC:dph] synonym: "P-element excision" NARROW [] synonym: "P-element transposition" NARROW [] synonym: "Tc1/mariner transposition" NARROW [] synonym: "Tc3 transposition" NARROW [] is_a: GO:0006310 ! DNA recombination is_a: GO:0032196 ! transposition

I have also experimented with relaxing the smallest and largest (smallest=1, largest=.99), and set clustuerCutHeight=0, but my GO terms are still missing. To your fourth bullet I found one offspring of my GO term in the results, but it isn't associated with any of the genes that map to my GO term of interest (the genes it does map to aren't significant). Further, the genes that are associated with my GO term of interest are also missing from the output - are all significant genes supposed to be re-assigned to other GO terms, or are they supposed to be removed?

Thanks for the help!

— Reply to this email directly, view it on GitHub https://github.com/z0on/GO_MWU/issues/12#issuecomment-1184757300, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGFQTQXZU3YD4A3N7LTVUBKP5ANCNFSM53IYAHDA . You are receiving this because you commented.Message ID: @.***>

-- cheers Misha matzlab.weebly.com

laurahspencer commented 2 years ago

By "background" I mean the goAnnotations list, and yes, that list is lousy with my GO term of interest (6,259 of 29,127 genes are associated with my GO term).

My genes contain 732 genes linked to my GO term of interest (it actually comprises ~28% of all "significant" genes).

For context, I'm analyzing WGNCA results, so my input includes all genes measured, and then module membership scores for genes assigned to the focal module. And the GO term relates to transposons, of which there are many in my focal species' genome.

I guess the big question now, and what I should have started this issue with, is why do so many of my genes get discarded despite me relaxing the settings that filter/merge GO terms?

z0on commented 2 years ago

Can this term be filtered out because it is too broad (is associated with more than 10% of all genes)? In that case try relaxing “largest” option to gomwuStats from 0.1 (default) to say 0.3

On Fri, Jul 15, 2022 at 1:42 AM Laura H Spencer @.***> wrote:

By "background" I mean the goAnnotations list, and yes, that list is lousy with my GO term of interest (6,259 of 29,127 genes are associated with my GO term).

My genes contain 732 genes linked to my GO term of interest (it actually comprises ~28% of all "significant" genes).

For context, I'm analyzing WGNCA results, so my input includes all genes measured, and then module membership scores for genes assigned to the focal module. And the GO term relates to transposons, of which there are many in my focal species' genome.

I guess the big question now, and what I should have started this issue with, is why do so many of my genes get discarded despite me relaxing the settings that filter/merge GO terms?

— Reply to this email directly, view it on GitHub https://github.com/z0on/GO_MWU/issues/12#issuecomment-1185029088, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGGJM2W4HE65V6ANBPLVUCQWRANCNFSM53IYAHDA . You are receiving this because you commented.Message ID: @.***>

-- cheers Misha matzlab.weebly.com

laurahspencer commented 2 years ago

Yes I have played with that setting quite a bit and tried various levels up to 0.99 (see code I attached in my first comment). Does that setting have a hard-coded ceiling? (E.g. nothing above 50% is analyzed)

z0on commented 2 years ago

Hmm, that’s surely possible… let me check

On Fri, Jul 15, 2022 at 5:11 PM Laura H Spencer @.***> wrote:

Yes I have played with that setting quite a bit and tried various levels up to 0.99 (see code I attached in my first comment). Is it possible that setting isn’t actually registered by the underlying functions?

— Reply to this email directly, view it on GitHub https://github.com/z0on/GO_MWU/issues/12#issuecomment-1185639535, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGHAAMAE73M2Q2GO5STVUF5ORANCNFSM53IYAHDA . You are receiving this because you commented.Message ID: @.***>

-- cheers Misha matzlab.weebly.com

laurahspencer commented 2 years ago

Thanks for checking! I've tried playing with the perl code but haven't had any breakthroughs

z0on commented 2 years ago

so what about this tho: does it print out something like this, and if yes, does the first number change when you change the “largest” option?

Run parameters: largest GO category as fraction of all genes (largest) : 0.1 smallest GO category as # of genes (smallest) : 5 clustering threshold (clusterCutHeight) : 0.25

On Mon, Jul 25, 2022 at 1:44 PM Laura H Spencer @.***> wrote:

Thanks for checking! I've tried playing with the perl code but haven't had any breakthroughs

— Reply to this email directly, view it on GitHub https://github.com/z0on/GO_MWU/issues/12#issuecomment-1194470650, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGE6PUV7EQDOIVPP5QLVV3OBZANCNFSM53IYAHDA . You are receiving this because you commented.Message ID: @.***>

laurahspencer commented 2 years ago

Yes the output changes- for example here's the output when I used the following settings: largest=0.99 smallest=1 cutHeight=0 (genes of interest still get discarded).

go.obo WGCNA-genes_for-GOMWU.tab WGCNA-module_lightgreen.csv BP largest=0.99 smallest=1 cutHeight=0

Run parameters:

largest GO category as fraction of all genes (largest)  : 0.99
         smallest GO category as # of genes (smallest)  : 1
                clustering threshold (clusterCutHeight) : 0

-----------------
retrieving GO hierarchy, reformatting data...

-------------
go_reformat:
Genes with GO annotations, but not listed in measure table: 1

Terms without defined level (old ontology?..): 0
-------------
-------------
go_nrify:
1174 categories, 2585 genes; size range 1-2559.15
    1 too broad
    0 too small
    1173 remaining

removing redundancy:

calculating GO term similarities based on shared genes...
598 non-redundant GO categories of good size
z0on commented 2 years ago

hmm. Just making sure: the option is clusterCutHeight (not cutHeight as your last email says) - is this how you ran it?

On Mon, Jul 25, 2022 at 3:26 PM Laura H Spencer @.***> wrote:

Yes the output changes- for example here's the output when I used the following settings: largest=0.99 smallest=1 cutHeight=0 (genes of interest still get discarded).

go.obo WGCNA-genes_for-GOMWU.tab WGCNA-module_lightgreen.csv BP largest=0.99 smallest=1 cutHeight=0

Run parameters:

largest GO category as fraction of all genes (largest) : 0.99 smallest GO category as # of genes (smallest) : 1 clustering threshold (clusterCutHeight) : 0


retrieving GO hierarchy, reformatting data...


go_reformat: Genes with GO annotations, but not listed in measure table: 1

Terms without defined level (old ontology?..): 0


go_nrify: 1174 categories, 2585 genes; size range 1-2559.15 1 too broad 0 too small 1173 remaining

removing redundancy:

calculating GO term similarities based on shared genes... 598 non-redundant GO categories of good size

— Reply to this email directly, view it on GitHub https://github.com/z0on/GO_MWU/issues/12#issuecomment-1194586207, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGAKSL5U43YE2QOFBSLVV3Z7FANCNFSM53IYAHDA . You are receiving this because you commented.Message ID: @.***>

laurahspencer commented 2 years ago

yes, sorry, i definitely used option clusterCutHeight

z0on commented 2 years ago

Ah, I see! here is the modified gomwu.functions.R file, plop it into your GO_MWU directory (replace old file) and give it a shot?

On Mon, Jul 25, 2022 at 6:11 PM Laura H Spencer @.***> wrote:

yes, sorry, i definitely used option clusterCutHeight

— Reply to this email directly, view it on GitHub https://github.com/z0on/GO_MWU/issues/12#issuecomment-1194751630, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGG7PVFWG6HHNYXGYKDVV4NKHANCNFSM53IYAHDA . You are receiving this because you commented.Message ID: @.***>

z0on commented 1 year ago

does it print out something like this, and if yes, does the first number change when you change the “largest” option?

Run parameters: largest GO category as fraction of all genes (largest) : 0.1 smallest GO category as # of genes (smallest) : 5 clustering threshold (clusterCutHeight) : 0.25

On Jul 15, 2022, at 9:36 PM, Mikhail V Matz @.***> wrote:

Hmm, that’s surely possible… let me check

On Fri, Jul 15, 2022 at 5:11 PM Laura H Spencer @. @.>> wrote:

Yes I have played with that setting quite a bit and tried various levels up to 0.99 (see code I attached in my first comment). Is it possible that setting isn’t actually registered by the underlying functions?

— Reply to this email directly, view it on GitHub https://github.com/z0on/GO_MWU/issues/12#issuecomment-1185639535, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGHAAMAE73M2Q2GO5STVUF5ORANCNFSM53IYAHDA. You are receiving this because you commented.

-- cheers Misha matzlab.weebly.com http://matzlab.weebly.com/