Closed jseager7 closed 3 years ago
A script to do that wouldn't be too much work after we nail down this bit:
We may also need an extra check to make sure that the pathogen gene and the host gene are wild-type, or that the host has no genes;
I've discussed this more with Alayne, and it seems like there are more rules than I've described above:
sometimes the disease extension has been made to a metagenotype with a mutant pathogen (non-control), so a corresponding wild-type control metagenotype will have to be automatically created in order to have a valid feature for the disease annotation;
some mutant metagenotypes haven't been annotated with an interaction_outcome
extension that specifies the compatibility of the interaction (a compatible interaction means the presence of disease). We only want to annotate diseases on the control metagenotype for compatible interactions. I don't think the interaction_outcome
extension can be added automatically, since it depends on the curator's judgement.
So first we need to add all of these missing interaction_outcome
extensions manually, then we can look into whether it's still feasible to automatically derive the disease annotations.
Thanks for looking into this but it looks like I will be doing it manually. Many of the sessions need some re-annotation to update them with the new requirements eg Control metagenotype annotations and new AEs so I may as well remove the old AE causes_disease and add new disease name curation type whilst working through these sessions.
Okay, I'll close this for now.
(Requested by @CuzickA)
Since we added the new disease annotation type, we don't need the old 'disease caused' annotation extension anymore. Unfortunately, there are still about 90 extensions already applied to existing curation sessions, and manually removing them and recreating them as disease annotations is going to be a lot of work.
@kimrutherford Is it possible to create a script that will automatically create disease annotations based on the annotation extensions? I think the process would be as follows:
Does that sound feasible? If the above is too difficult or too risky (in terms of database inconsistency), then maybe just removing the extensions and keeping a log of the metagenotypes they were applied to will be enough (probably plus the annotation extensions for the metagenotype so we can track tissue types, and so on).