legumeinfo / mine-issues

Report ALL issues on LIS mines here! Regardless of which mine you found it on!
2 stars 0 forks source link

Gene Ontology Enrichment widget working for molecular_function only #167

Closed StevenCannon-USDA closed 1 month ago

StevenCannon-USDA commented 1 month ago

The Gene Ontology Enrichment widget at https://mines.legumeinfo.org/glycinemine seems to be working for molecular_function but not cellular_component or biological_process

To test, I enter this list -- which should show enrichment, since all of these genes share the GO terms GO:0005515,GO:0003676 Stabilizer of iron transporter SufD / Polynucleotidyl transferase

glyma.Wm82.gnm4.ann1.Glyma.02G002500
glyma.Wm82.gnm4.ann1.Glyma.04G150900
glyma.Wm82.gnm4.ann1.Glyma.05G011300
glyma.Wm82.gnm4.ann1.Glyma.06G213300
glyma.Wm82.gnm4.ann1.Glyma.09G167100
glyma.Wm82.gnm4.ann1.Glyma.10G002700
glyma.Wm82.gnm4.ann1.Glyma.10G242800
glyma.Wm82.gnm4.ann1.Glyma.16G217300
glyma.Wm82.gnm4.ann1.Glyma.17G119400
glyma.Wm82.gnm4.ann1.Glyma.20G151400

Then I save this list of ten genes and check the widget at the bottom of the report page. For cellular_component and biological_process, the result is "No enrichment found"; whereas for molecular_function, "nucleic acid binding" and "protein binding" are reported as being enriched. I see the same result if I generate a list of all 52K genes in this annotation set and set that as the background population -- so I think the "Default" is probably correct.

If I check the report for a single gene ... https://mines.legumeinfo.org/glycinemine/gene:glyma.Wm82.gnm4.ann1.Glyma.02G002500 I see the expected GO terms (GO:0005515,GO:0003676).

Checking one of these GO terms ... https://mines.legumeinfo.org/glycinemine/goterm:GO:0005515 I see that the Namespace is molecular_function

So, I wonder if it is possible to associate additional namespaces for the GO terms? Or some other method for getting the enrichment widget to know about cellular_component and biological_process for GO terms?

Not urgent, since this is a somewhat specialized function; however, GO enrichment is a useful kind of analysis, and it seems not to be working correctly.

adf-ncgr commented 1 month ago

@StevenCannon-USDA not %100 sure but I think your test list may only be enriched for molecular_function (which is how you selected them). I just did a keyword search for "membrane" and the resulting gene list seemed to have enrichment in all three "namespaces". Let us know if you think I'm misunderstanding/misinterpreting

StevenCannon-USDA commented 1 month ago

@adf-ncgr Ah - thanks. I'll do some more testing and report.

StevenCannon-USDA commented 1 month ago

@adf-ncgr - Thanks for the quick feedback and helpful hint. You are right. My test set just happened to be specific for molecular_function.

Here are three other test sets that work across all three GO aspects. (I am working on this in the process of writing a page to introduce this kind of analysis. It might become a blog post.)

Aspect: Molecular Function
  GO:0005215  "transporter activity"
    glyma.Wm82.gnm4.ann1.Glyma.01G022700
    glyma.Wm82.gnm4.ann1.Glyma.01G035000
    glyma.Wm82.gnm4.ann1.Glyma.01G041400
    glyma.Wm82.gnm4.ann1.Glyma.01G041450
    glyma.Wm82.gnm4.ann1.Glyma.01G042100
    glyma.Wm82.gnm4.ann1.Glyma.01G081600
    glyma.Wm82.gnm4.ann1.Glyma.01G081700
    glyma.Wm82.gnm4.ann1.Glyma.01G105000
    glyma.Wm82.gnm4.ann1.Glyma.01G112500
    glyma.Wm82.gnm4.ann1.Glyma.01G113400

    biological_process: transport [GO:0006810]  3.880204e-12    10
    cellular_component: membrane [GO:0016020]   1.894009e-4 10
    molecular_function: transporter activity [GO:0005215]   1.600475e-15    10

Aspect: Cellular Component
  GO:0005856  "cytoskeleton"
    glyma.Wm82.gnm4.ann1.Glyma.01G128700
    glyma.Wm82.gnm4.ann1.Glyma.01G155300
    glyma.Wm82.gnm4.ann1.Glyma.03G041600
    glyma.Wm82.gnm4.ann1.Glyma.03G146200
    glyma.Wm82.gnm4.ann1.Glyma.05G158300
    glyma.Wm82.gnm4.ann1.Glyma.08G116000
    glyma.Wm82.gnm4.ann1.Glyma.09G194900
    glyma.Wm82.gnm4.ann1.Glyma.09G279900
    glyma.Wm82.gnm4.ann1.Glyma.11G089400
    glyma.Wm82.gnm4.ann1.Glyma.13G162500

    biological_process: actin cytoskeleton organization [GO:0030036]    4.626113e-10    5
    cellular_component: cytoskeleton [GO:0005856]   2.640981e-21    10
    molecular_function: microtubule motor activity [GO:0003777] 0.002433    3
                        microtubule binding [GO:0008017]    0.004066    3
                        actin binding [GO:0003779]  0.041614    2

Aspect: Biological Process
  GO:0007165  "signal transduction"
    glyma.Wm82.gnm4.ann1.Glyma.01G032400
    glyma.Wm82.gnm4.ann1.Glyma.01G032900
    glyma.Wm82.gnm4.ann1.Glyma.01G033200
    glyma.Wm82.gnm4.ann1.Glyma.01G033300
    glyma.Wm82.gnm4.ann1.Glyma.01G039000
    glyma.Wm82.gnm4.ann1.Glyma.01G046900
    glyma.Wm82.gnm4.ann1.Glyma.01G060300
    glyma.Wm82.gnm4.ann1.Glyma.01G112200
    glyma.Wm82.gnm4.ann1.Glyma.01G112300
    glyma.Wm82.gnm4.ann1.Glyma.01G125300

    biological_process: defense response [GO:0006952]   3.762962e-13    9
                        signal transduction [GO:0007165]    1.688834e-12    9
    cellular_component: No enrichment
    molecular_function: ADP binding [GO:0043531]    5.387080e-15    9
                        protein binding [GO:0005515]    5.267104e-4 9
adf-ncgr commented 1 month ago

+1 to writing up some user guidance about the existence of this function- as I recall it had come up as a use case of interest when we had our discussion with Anna Locke.