pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
19 stars 7 forks source link

Migrate disease annotation extensions to disease annotation type #2416

Closed jseager7 closed 3 years ago

jseager7 commented 3 years ago

(Requested by @CuzickA)

Since we added the new disease annotation type, we don't need the old 'disease caused' annotation extension anymore. Unfortunately, there are still about 90 extensions already applied to existing curation sessions, and manually removing them and recreating them as disease annotations is going to be a lot of work.

@kimrutherford Is it possible to create a script that will automatically create disease annotations based on the annotation extensions? I think the process would be as follows:

  1. Query the session databases for all metagenotype annotations with the causes_disease extension.
  2. Record the following information:
    • the ID of the metagenotype linked to the annotation;
    • the term ID for the causes_disease extension (a PHIDO term);
    • the term ID for the infects_tissue extension, if present (a BTO term).
  3. We may also need an extra check to make sure that the pathogen gene and the host gene are wild-type, or that the host has no genes; @CuzickA can probably advise on the exact rules here.
  4. Once all the necessary information is collected, use it to create a disease annotation on the previous metagenotype, with the annotation term being the PHIDO term, and add the infects_tissue extension to the annotation with the BTO term.
  5. Finally, delete the old instance of the causes_disease extension.

Does that sound feasible? If the above is too difficult or too risky (in terms of database inconsistency), then maybe just removing the extensions and keeping a log of the metagenotypes they were applied to will be enough (probably plus the annotation extensions for the metagenotype so we can track tissue types, and so on).

kimrutherford commented 3 years ago

A script to do that wouldn't be too much work after we nail down this bit:

We may also need an extra check to make sure that the pathogen gene and the host gene are wild-type, or that the host has no genes;

jseager7 commented 3 years ago

I've discussed this more with Alayne, and it seems like there are more rules than I've described above:

So first we need to add all of these missing interaction_outcome extensions manually, then we can look into whether it's still feasible to automatically derive the disease annotations.

CuzickA commented 3 years ago

Thanks for looking into this but it looks like I will be doing it manually. Many of the sessions need some re-annotation to update them with the new requirements eg Control metagenotype annotations and new AEs so I may as well remove the old AE causes_disease and add new disease name curation type whilst working through these sessions.

jseager7 commented 3 years ago

Okay, I'll close this for now.