pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
19 stars 7 forks source link

PHI-base GO MF config file #2182

Closed ValWood closed 4 years ago

ValWood commented 4 years ago

GO:0038023 signaling receptor activity needs has_input

(Note to self , fix PMID:30610168)

ValWood commented 4 years ago

Also, for the term PHIPO:0001106 pathogen host protein-protein interaction present need to be able to specify 2 assayed using (i.e both interacting partners)

jseager7 commented 4 years ago

@ValWood do you have an example of the configuration for the has_substrate extension? Based on the FYPO extension ontology, I can only see an example of assayed_substrate:

[Typedef] id: assayed_substrate name: assayed_substrate def: "Relation between a catalytic activity phenotype and a substrate, such as a gene product, with which the phenotype was assayed." [PomBase:mah] comment: normal or abnormal protein kinase activity assayed_substrate PomBase:SPBC11B10.09 property_value: local_domain FYPO:0000654 ! catalytic activity phenotype is_a: assayed_using

There seems to be an example of the has_substrate relation in the test configuration for Canto, but this hasn't been updated in years, so I'm not sure if it's accurate:

domain ID subset relation extension relation range ID Canto display text Help text cardinality role
GO:0016023 is_a has_substrate GENE kinase substrate   0,1 user
ValWood commented 4 years ago

Hi, The first one is GO. That should be has_input I think (has substrate is our biologist friendly conversion for the gene pages)

The second one is PHIPO. You are correct, this should be 'assayed_using". At present, I can add one "assayed_using" but protein binding can have 2 interacting partners, so the user should be prompted for 2 gene products. (I worked around this by adding as free text)

I will try to track down the syntax for this in the FYPO config....

ValWood commented 4 years ago

It's

FYPO:0000702 is_a assayed_using ProteinID affected proteins (add TWO, i.e. both binding partners) 2 user

jseager7 commented 4 years ago

I already have examples of assayed_using, so I can work on that now, but I can't find any examples of the has_input relation.

We currently have assayed_using with two partners on PHIPO:0000132 (protein-protein interaction phenotype) and all its children (see below), so the fact it's not being applied to PHIPO:0001106 seems to be an oversight, resulting from the fact we always have to remember to apply extensions to both the single-species and pathogen-host branches.

With that in mind, would you want this new assayed_using relation pushed up in the hierarchy to 'pathogen host protein-protein interaction phenotype' (PHIPO:0000164), so it also encompasses the terms under 'pathogen host protein-protein interaction absent' (PHIPO:0001107)?

domain ID subset relation extension relation range ID Canto display text Help text cardinality role
PHIPO:0000132 is_a assayed_using GeneID affected proteins (add TWO, i.e. both binding partners)   2 user
jseager7 commented 4 years ago

@ValWood decided to update this in the PomBase config instead (see here), then PHI-base can synchronise with these config files if needed.

ValWood commented 4 years ago

Hi @mah11 could you add this to the pombe config file for GO MF GO:0038023 signaling receptor activity needs has_input v

mah11 commented 4 years ago

GO:0038023 signaling receptor activity needs has_input

Hmmm. I had a quick look, to check the range should be, and I wonder if it's ideal to use GO:0038023 in the config. Is this meant to prompt for a gene (as proxy for its product)? It has a lot of descendants representing receptors for ligands that aren't gene products (simple compounds like GABA or glutamate; light; oddballs like "salty taste" ...).

Will it be a problem if the extension prompt appears for those, as it will if we just put GO:0038023 in the config?

ValWood commented 4 years ago

Hmm, specifically here we are annotating

GO:0038187 | pattern recognition receptor activity so we could put it on this (which would only apply to things recognising pathogens, so would reduce the scope hugely)

However, it would have the same issue in that not all PRR's bind to proteins. These are host proteins to recognise pathogens and they bind to some host proteins:

"In addition to the well-characterized PAMPs flagellin, EF-Tu, and chitin, many pathogen-secreted metabolites or virulence proteins can also activate the plant immune system; these include lipopolysaccharides, peptidoglycan volatiles, glycoproteins, cell wall degradation enzymes (CWDE), and other pathogen-secreted proteins (Liu et al., 2012; Ranf et al., 2015)."

So, in an ideal world, we would be able to select either a protein (which will be one of the gene products we are annotating OR a CHEBI molecule.

I think we might already be able to do this? I recollect that we had a similar x or y option for a gene or a SO term for some extensions? Is this possible?

mah11 commented 4 years ago

Yes, the syntax does support "x or y" - e.g. the lines with "TranscriptID|SO:0000673" in the FYPO extension config.

Are there any descendants of GO:0038187 where a CHEBI ID would be redundant with the descendant's term name & def (there are oodles for the parent GO:0038023)? Not that I want to get into the weeds too much .. just trying to stay away from the other extreme of "oops we didn't think that through".

ValWood commented 4 years ago

Quite possibly but I suspect there will be some GO changes here (I will recommend that specific substrates are captures with CHEBI).

PomBase is unlikely to use this term ever, but in PHI-base we will use it a lot. So I can monitor and refine once the GO churn is over. It should become obvious quite quickly if something is bonkers. I think it will be OK, just not as refined as it could be.

mah11 commented 4 years ago

OK, I've put GO:0038187 in the PomBase MF extension config. For ease of copying over to PHI-etc, this is the line:

GO:0038187  is_a    has_input   ProteinID|CHEBI:24431   has ligand  (protein or chemical substance)     0,1 user
ValWood commented 4 years ago

@jseager7 is going to make the PHI ones point to ours for GO because they should be the same . James is there a ticket for this?

jseager7 commented 4 years ago

James is there a ticket for this?

Not yet, but I'll open one on our config repository because it belongs there. I'll link here when it's done.

jseager7 commented 4 years ago

I've got a question about the subset relations (listed below) in the GO extension config files: where do they come from? I'm presuming they need definitions in a corresponding ontology, so do they already exist in GO? Will we need to load any extra ontologies into PHI-Canto to enable these?

We're currently loading has_qualifier_range.obo and fypo_extension.obo from the PomBase file server, plus the standards like GO-basic, PSI-MOD, and RO (and all of the PHI-base-related ontologies that we need).

GO subset relations

ValWood commented 4 years ago

This is a very good question. This is the part that is 'in flux'. Go are trying to consolidate the relations and reduce the set used. Then they will be migrated to ~BFO~ RO.

We don't really need definitions for them in Canto as the user is protected from needing to select the appropriate relation, we do it for them in the config.

ValWood commented 4 years ago

Obviously good to have once the dust settles, but we don't absolutely need them to proceed.

kimrutherford commented 4 years ago

Will we need to load any extra ontologies into PHI-Canto to enable these?

These relations don't need an ontology. Canto just needs them to be in the config file.

jseager7 commented 4 years ago

These relations don't need an ontology. Canto just needs them to be in the config file.

But is the extra information about them (definitions, comments, etc.) supplied from a companion ontology?

jseager7 commented 4 years ago

@ValWood I've added assayed_using on PHIPO:0000164 (pathogen host protein-protein interaction phenotype) and I've updated our GO extension config to match what's on pombase/pombase-config. The GO config isn't being synced automatically yet, but I'm planning to do that.

The extensions should update once the server reloads the ontologies overnight.

If that's all you need, feel free to close this issue.

ValWood commented 4 years ago

ACtually I will close this. There is a note in the session to add the substrate to the GO receptor (this change is not through yet)

The PHIPO interactions seem to allow assayed using. I did not see the bit about 2 interactors, but we use this quite a lot so it will soon be obvious if it isn't working.

kimrutherford commented 4 years ago

But is the extra information about them (definitions, comments, etc.) supplied from a companion ontology?

We don't use the definitions and comments anywhere in Canto so they don't need to be loaded.

ValWood commented 4 years ago

@jseager7 eventually all of the relations used by GO extensions will be in ~BFO~ RO. At present GO use an internal ontology. Once they decide which relations are "acceptable" in GO extensions, missing relations will be added to ~BFO~ RO.

More info is here but this is also out of date http://wiki.geneontology.org/index.php/Annotation_Extensions

jseager7 commented 4 years ago

The PHIPO interactions seem to allow assayed using. I did not see the bit about 2 interactors, but we use this quite a lot so it will soon be obvious if it isn't working.

@ValWood The extension might not have been working correctly because I forgot to restart Canto after updating the config. Should be fine now, although note that I think we have overlapping configurations on some terms, which means you'll see both the cardinality 2 protein extension and the cardinality 1 extension:

image

I've opened an issue about this on the config repository.

mah11 commented 4 years ago

eventually all of the relations used by GO extensions will be in BFO.

corrected in comment above - relations are going into the Relations Ontology (RO)

ValWood commented 4 years ago

Should be fine now, although note that I think we have overlapping configurations on some terms, which means you'll see both the cardinality 2 protein extension and the cardinality 1 extension:

@mah is there a way to fix this in config?

kimrutherford commented 4 years ago

although note that I think we have overlapping configurations on some terms, which means you'll see both the cardinality 2 protein extension and the cardinality 1 extension:

I think the way to fix that is to change the domain to something like: GO:0034613-is_a(GO:0006886) and the have a separate configuration with the excluded subset (eg. "GO:0006886") as the domain.

For example was have this in the GO config:

 GO:0006886      is_a    has_input       ProteinID       transports              0,1     user
 GO:0034613-is_a(GO:0006886)     is_a    has_input       ProteinID       localizes               0,1     user

I hope that helps. I'm happy to chat on Skype about this.

jseager7 commented 4 years ago

Thanks @kimrutherford, but fortunately it wasn't necessary to add excluded subsets, because I was able to fix the problem by removing redundant domain IDs.