pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
19 stars 7 forks source link

Add curation option for peptide motif (SO) #2507

Closed ValWood closed 2 years ago

ValWood commented 2 years ago

Follow on from https://github.com/pombase/canto/issues/20

Curation type Protein sequence feature or motif

Description/prompt text A Protein localization signal is a peptide sub-sequence used to target the protein to a specific organelle.

Start typing in the search box (type at least 2 characters). If you do not find the term you are looking for with your initial search, begin with a broad term (peptide_localization_signal ). Note that sequence ontology term names include underscores instead of spaces. More specific terms will be suggested allowing you to refine your search iteratively before making your final selection.

Branch of SO We only need this SO subset to be avaiable http://sequenceontology.org/browser/current_svn/term/SO:0001527

Evidence selection EXP or ISS

Extension “region/residue” (Mandatory) This will be an amino acid range i.e. R97-K101

I wanted to avoid a generic "protein feature" option as people might try to capture domains and families, or modification sites, which are also available in this branch of SO, but which we capture in other ways.

[ Future: Once this works we will add an option for protein stability element ( the stability element we use are currently missing this parent in SO). I don’t think there are any other branches that we will use]

kimrutherford commented 2 years ago

“region/residue” (Mandatory)

We don't have a way to make extensions mandatory at the moment.

ValWood commented 2 years ago

We don't have a way to make extensions mandatory at the moment.

That's OK, I think most people would add automatically if describing a motif, and we will check.

ValWood commented 2 years ago

It will be a bit like the gene expression terms I hope. everyone used the extensions for those.

Can you put a time estimate on this? Is it mainly configuration?

kimrutherford commented 2 years ago

Unless I'm misunderstanding things, it should be less than an hour to change the config and test the changes. We'll probably want to deploy it in the test Canto before going live.

We'll need to update the documentation too.

kimrutherford commented 2 years ago

I had a fix some bugs that cropped up because of the new configuration. But I've now had a first go at this and it's deployed in the test Canto. Let me know any changes you'd like.

On this page: https://curation.pombase.org/test/curs/f7d275c886036eab/feature/gene/annotate/2/start/protein_sequence_feature_or_motif

Do we need any extra help text when you click "more..." ?

ValWood commented 2 years ago

I just checked. That works really well.

~I don't think any extra help is needed it's all really obvious, especially with the pull down menu showing the available options (If anyone wants an additional one they can ask)~

ValWood commented 2 years ago

especially with the pull down menu showing the available options

This is probably temporary because on closer inspection it doesn't include all the terms we use. We will need to browse the ontology sub tree.

ValWood commented 2 years ago

Right I put just "protein localization signal" but we also annotate degron signals like KEN boxes.

ValWood commented 2 years ago

Do we need any extra help text when you click "more..." ?

kimrutherford commented 2 years ago

make the example a range K23-K30

That's done for the next time I re-release the test Canto.

We need to go up the hierarchy

If it helps we should be able to configure Canto to include multiple sub-ontologies and also, if needed, to exclude sub-ontologies.

ValWood commented 2 years ago

If it helps we should be able to configure Canto to include multiple sub-ontologies and also, if needed, to exclude sub-ontologies.

Oh wow that will be perfect. I will see if I can get SO to push some fixes through (some of the confusing terms are children of our required starting terms)

I will also identify the branches that we need.

ValWood commented 2 years ago

If it helps we should be able to configure Canto to include multiple sub-ontologies and also, if needed, to exclude sub-ontologies.

I had a look and this won't help us at the moment. Until these are fixed: https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/569 https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/571 https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/574

We need to use "polypeptide_region" SO:0000839

I think this will be fine because although there are some things we don't want to capture this way (like domain), they are stubs, and also I doubt that our community will try to use the. If so we can fix.

polypeptide_metal_contact will not appear using this subset, but I opened a ticket for that and we only used it once so far, so it isn't show stopper.

Later, we can probably exclude some terms.

ValWood commented 2 years ago

pinging this ticket. I am going to show @manulera shortly the curation of these features in Artemis, but maybe this is too close?

kimrutherford commented 2 years ago

It's working in the test Canto. I can deploy it in the main Canto anytime. Sorry, I misread this comment and thought we were waiting for SO changes: https://github.com/pombase/canto/issues/2507#issuecomment-1008105091=

Are you happy with how it works in the test Canto?

ValWood commented 2 years ago

Yes, I forgot where we were up to with this too! It works fine. We might need to rethink how we configure the terms if we add more, but for now the static list works OK> I think longer term we are hoping the SO makes it easier by correctly grouping all of the terms we need.

Screenshot 2022-05-12 at 13 52 07
manulera commented 2 years ago

Ok I saw it, looks nice!

kimrutherford commented 2 years ago

We might need to rethink how we configure the terms if we add more, but for now the static list works OK

Should I deploy it in the main Canto?

ValWood commented 2 years ago

Yes please, deploy. Even if we do add new terms over time we can survive in this mode for quite a while because the rate we add and use them is so slow.

kimrutherford commented 2 years ago

Yes please, deploy.

Done! Let me know if you see any problems.

kimrutherford commented 2 years ago

I'm leaving this open for now. Once you've successfully done some curation with this annotation type, please let me know so I can close it.

The annotations should automatically appear in the "Protein sequence features" of the "Protein features" section, but I'd like to double check once there are some of these annotations in an approved session.

One other thing to think about: if the help text on the long workflow isn't right we should change that too. This is how it looks at the moment:

image

ValWood commented 2 years ago

@manulera

ValWood commented 2 years ago

Closing. It's a long ticket and we can open new tickets if there are problems.