pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
18 stars 7 forks source link

Allow the use of ontology subsets in canto to restrict terms "not for direct annotation" #1411

Closed ValWood closed 5 years ago

ValWood commented 6 years ago

GO:0007049 cell cycle has Restrictions This term should not be used for direct manual annotation; it may, however, be used for mapping to external vocabularies in order to create electronic annotations.

We will also use similar subsets for phenotype terms where the terms should not be used for direct annotation.

in FYPO: subsetdef: qc_do_not_annotate "Term not to be used for direct annotation" subsetdef: qc_do_not_manually_annotate "Term not to be used for direct manual annotation"

in GO: subsetdef: gocheck_do_not_annotate "Term not to be used for direct annotation" subsetdef: gocheck_do_not_manually_annotate "Term not to be used for direct manual annotation"

In these cases the "proceed" button should not be visible and a message should say "This term should not be used for direct annotations, please select a child term"

ValWood commented 6 years ago

Kim asked: If a term is marked a not for direct annotation, does that mean all it's parents shouldn't be used for direct annotation either?

Antonia pointed out that "cytoplasmic part" is do-not-annotate, but its part_of parent "cytoplasm" is fine (as it should be).

However I think in the first implementation we should allow annotation to anything which does not have the subsetdef: high_level_annotation_qc "High-level terms not to be used for direct annotation"

I suggested that GO block a lot of high level terms so we are in the process of adding the sunsetdef. We should be able to block quite a lot of high level terms

ValWood commented 6 years ago

Kim said: The current Canto ignores any terms in those do_not_annotate subsets. If you type the name of a term in those subsets it won't autocomplete. That was done for #1084 Would you rather autocomplete, then show that message?

ValWood commented 6 years ago

@kimrutherford I tested this using GO:0007049 cell cycle and I can't access the term in the autocomplete. I don't think this is the desired behaviour. We still want to be able to search on, access and drill down from the terms, BUT we don't want users to be able to select them for annotation (see above).

In this case, I can still select for annotation by drill down, cell cycle

but I can't see it in the search. This is the opposite way to what we want, I think...

@mah11 @Antonialock agreed?

mah11 commented 6 years ago

Not a decision for me to influence, since I don't use those ontology term entry routes.

ValWood commented 6 years ago

But there might be a term that you would use that is blocked.

For example terms in http://curation.pombase.org/dumps/latest_build/logs/log.2018-04-18-18-14-52.excluded_go_terms_softcheck

eventually we would like to have these "blocked for annotation".

In this case would you expect to be able to search, and access GO:0016197 endosomal transport

or for example GO:0010389 regulation of G2/M transition of mitotic cell cycle if we block this term with the supposition that you should be able to specify +ve of -ve regulation.

i.e this is not a term specific question it's about general behavior.

mah11 commented 6 years ago

It is behavior of Canto features that I do not use.

ValWood commented 6 years ago

I don't understand this comment. You are saying that you do not use the search?

You do occasionally use terms which will eventually be in the "do not annotate list", because I fix them. So in this case would you expect that you could not enter Canto by any of the terms in this list?

mah11 commented 6 years ago

You are saying that you do not use the search?

That's correct.

ValWood commented 6 years ago

But the same question would apply wherever you search the ontology even within the quick add/edit window.

Would you expect, for example "cytokinesis" to be blocked from any search access point?

and would it make more sense to tell the user that a term is not available for curation if they try to select it, and force specificity, than not allow it to be searched in the first place. The current behaviour (when working) just provides a "no entry" icon in the search interface IIRC....

ValWood commented 6 years ago

Maybe you can try to think about it form the users perspective even if you have some magic way to search on ontology terms?

mah11 commented 6 years ago

I don't search the ontology via Canto. I am aware that I am atypical in this regard, and that is why I do not presume to know what community users will expect.

ValWood commented 6 years ago

I don't think you are understanding what I am asking then. You must type the term in somewhere?

mah11 commented 6 years ago

Yes, but term entry is not the same as a search, even though the "quick" Canto interface happens to allow both starting from the same text box.

ValWood commented 6 years ago

So you are saying that you never ever add a term into the search box which is currently blocked for annotation (or might be blocked in the future) for direct annotation?

Anyway, I'm convinced that we need to switch the behaviour as I suggested for every place where a term can be added into a box, so I'll consider this one "discussed" and we can move on...

@kimrutherford let me know if you still have any questions....

mah11 commented 6 years ago

you never add a term into the search box which is currently blocked for annotation

To a first approximation this is probably true, because I'll see the subset tag in the ontology file.

(or might be blocked in the future)

No, I'm not saying this at all. I can't read minds, and I don't walk around with the contents of tickets where you've suggested additions to the don't-annotate subsets in my head. I only go by what's actually been added to those subsets in the ontologies.

ValWood commented 6 years ago

OK, I'm hoping that we can eventually use "internal tags" which are not necessarily implemented in GO...

The use of terms "not for direct annotation" in PomBase preceded its use in GO. I'm not sure that they will accept all of our suggestions, in which case we will require a way to be able to supplement the obo file with our own tags in the future...

Antonialock commented 6 years ago

I think we should be able to find and drill down from "do not annotate" terms, but the proceed button should be greyed out (maybe some help text saying you need to select a more or less specific term)

Would be great if enrichment tools would exclude those terms...I think the long slew of term lists they return are really disliked by users

ValWood commented 6 years ago

Would be great if enrichment tools would exclude those terms...I think the long slew of term lists they return are really disliked by users

that would be useful! In some cases they can be useful (you may only see enrichment to a term we do not use directly like "cell cycle", but a filter to include or exclude would be really useful.

I think we should be able to find and drill down from "do not annotate" terms, but the proceed button should be greyed out (maybe some help text saying you need to select a more or less specific term)

yes so basically agree @kimrutherford make sense to you?

kimrutherford commented 6 years ago

I have a couple of questions.

There are two subsets in GO:

subsetdef: gocheck_do_not_annotate "Term not to be used for direct annotation"
subsetdef: gocheck_do_not_manually_annotate "Term not to be used for direct manual annotation"

Currently both of those subsets are not loaded. Should both now be loaded, but displayed with a grey-outed Proceed button? Or do the two subsets need different treatments?

In GO, the three highest level terms (molecular_function, etc.) aren't in the do-not-annotate subsets. Do we need to handle those separately?

mah11 commented 6 years ago

There are two subsets:

subsetdef: gocheck_do_not_annotate "Term not to be used for direct annotation"
subsetdef: gocheck_do_not_manually_annotate "Term not to be used for direct manual annotation"

... Or do the two subsets need different treatments?

These can be handled the same way in Canto, because Canto makes manual annotations.

In GO, the three highest level terms (molecular_function, etc.) aren't in the do-not-annotate subsets. Do we need to handle those separately?

That's correct for GO, because they're used with the ND evidence code for "known unknowns" (i.e. someone looked and found no data to support an annotation in the branch). But Canto doesn't support making unknown/ND annotations, and I think we want to keep it that way, so I guess that does mean the root terms need their own handling.

kimrutherford commented 6 years ago

That's correct for GO,

OK, thanks for the explanation.

I've just spotted that Canto completes on molecular_function. I don't think it's supposed to do that. The root terms shouldn't be loaded.

ValWood commented 6 years ago

I think the root terms should be loaded for drill down if you have "no idea" where to begin (we might even have links in the prompts?). Not many people will come in at this point, and the select button can be greyed out).

kimrutherford commented 6 years ago

So just to check: we should load all terms that aren't taxon-excluded but grey out the root terms and the do-not-annotation subset terms?

ValWood commented 6 years ago

spot on.

kimrutherford commented 5 years ago

I think the root terms should be loaded for drill down if you have "no idea" where to begin

So, molecular_function, cellular_component etc? Are those a bit too generic?

ValWood commented 5 years ago

It is fine if they are visible, they have no flags , and they should be available for drill down. However it is rare that anyone will ever use them.

Most people won't find them anyway, as for some reason I can only get to the term with GO:0003674, but not with "molecular function", or even the term name "molecular _function"

I think we should not worry to much about the root nodes.....but they have no "do not annotate" restrictions

kimrutherford commented 5 years ago

they have no flags

What flags do you mean?

Most people won't find them anyway, as for some reason I can only get to the term with GO:0003674, but not with "molecular function",

That's what I was about to change (maybe). So should they be able to autocomplete on molecular function and then start drilling down from that term? Or should be hide the root terms from the autocompletion?

ValWood commented 5 years ago

I mean they have no "do not annotate" restrictions in the obo file, so they should be visible?

kimrutherford commented 5 years ago

We currently hide the terms in the do_not_annotate subset and the root terms.

We can change things to allow the autocomplete to find do_not_annotate terms but not root terms or could allow finding root terms too.

ValWood commented 5 years ago

yes I think we should be able to access the root nodes, although it won't be used often. For MF I sometimes use this starting point because I can't remember what the high level terms are.

kimrutherford commented 5 years ago

OK, thanks. We need to think about what (if any) extra text is needed on the page for do_not_annotate terms to explain things.

When we grey out the continue button we can show a short explanation for the greyness when you hover your mouse over it.

ValWood commented 5 years ago

from earlier: In these cases the "proceed" button should not be visible and a message should say "This term should not be used for direct annotations, please select a child term"

kimrutherford commented 5 years ago

from earlier: In these cases the "proceed" button should not be visible and a message should say "This term should not be used for direct annotations, please select a child term"

Thanks. I missed that. I changed that to "select a more specific term" to match the rest of the page, but it's easy to change if "child term" is better.

I've finished the changes to allow all the do_not_annotate terms to appear in the search. The continue button is greyed out until you drill down to a non-do_not_annotate term.

The changes are in the test tool. Could you let me know if it works the way you want?

https://curation.pombase.org/test

mah11 commented 5 years ago

I changed that to "select a more specific term" to match the rest of the page, but it's easy to change if "child term" is better.

It's not. Much better to leave it as you've done it - "child term" is both (a) ontology jargon, and (b) not entirely accurate for all relationship types. "Select a more specific term" should be clear to anyone.

ValWood commented 5 years ago

it looks like this:

cell cycle

I wonder if greying out the proceed button, and having to "hover over' is optimal. Maybe in these cases

the text Can you use a more specific available term

should change to "This term should not be used for direct annotations, please select a more specific term"

and the proceed button should not appear at all?

this would be immediately obvious without hovering over proceed to see why and waiting for the pop up to appear.

mah11 commented 5 years ago

"This term should not be used for direct annotations, please select a more specific term"

Nitpick: whatever the text is, it shouldn't include a comma splice. It would work either to break into two sentence, or to add a connecting word:

"This term should not be used for direct annotations. please select a more specific term."

or

"This term should not be used for direct annotations, so please select a more specific term"

kimrutherford commented 5 years ago

I wonder if greying out the proceed button, and having to "hover over' is optimal. ...

I think your suggestions make sense. I'll go ahead and make those changes so we can try it out in the test Canto.

kimrutherford commented 5 years ago

I've updated the test and main tools because these changes don't appear to make things worse. :-)

Let me know if you spot any weirdness.

ValWood commented 5 years ago

great, I'm going to transfer loads of our restrictions to GO now.....