intermine / pombemine

0 stars 1 forks source link

possible bug in phenotype loading #7

Closed ValWood closed 2 years ago

ValWood commented 2 years ago

I checked the example gene (sty1 ) with the query retrieve all alleles and phenotypes.

I got 626 rows and was surprised that around the first 200 rows had no allele symbol (I thought all sty1 allels were named).

I checked and they do seem to be all named, so I looked at the publication record for one of the un-names alleles:

Screenshot 2021-11-15 at 14 36 33

and sty1 does not appear to be mentioned/curated for this publication: https://www.pombase.org/reference/PMID:11069892

rachellyne commented 2 years ago

Thanks Val - yes we have noticed this problem. It is with all the alleles entered as unrecorded, which do not have a name or symbol. They have all attached themselves to the Sty1 gene. I guess it is a problem with merging these alleles without unique identiifers.

ValWood commented 2 years ago

OK, all alleles should have names though ? Sometimes they don't have descriptions but they will always have names? @kimrutherford is that correct? We can discuss this tomorrow on the call.

Can you give me an example of an allele that has no name, I can have a look...

ValWood commented 2 years ago

Actually, if they all attached to sty1 the number to annotate properly is less than 200,( this must be the legacy annotation in artemis). If this is the case we can just not export them for now (@kimrutherford) and I can try to prioritise them for curation (not likely to be until next year with other deadlines - but we won't lose much by not exporting them, as most are viability or elongated phenotypes these phenotypes are covered by the genome-wide dataset.

~Actually I'm tempted to deleted them -I'll take a look.~

ValWood commented 2 years ago

I had a look, it is the old data in Artemis (we did discuss this but I forgot). I now decided don't want to delete them from Artemis, some are quite useful. Kim could you just not export those ones for pombeMine. I will then use the papers in this list for training the new hire so they will get cleaned up before we do any other legacy publications. I'll open a ticket on our tracker.

rachellyne commented 2 years ago

This is the set of alleles that have "unrecorded" as the allele description and no name. They are attached to various genes (the attachment to sty1 in pombeMine is because of an integration error). unrecorded.csv

ValWood commented 2 years ago

Yep got it. We will do the export without these for now. The papers will be prioritised for full curation (this is old legacy stuff before Canto)

kimrutherford commented 2 years ago

OK, all alleles should have names though ? Sometimes they don't have descriptions but they will always have names?

There are about 600 alleles in Chado without names.

ValWood commented 2 years ago

Fixed by not exporting un named alleles , and prioritising for curation in PomBAse