monarch-initiative / hpoannotqc

HPO Annotation QC
http://hpo-annotation-qc.readthedocs.io/en/latest/#
MIT License
11 stars 2 forks source link

IEA in column 7? #12

Closed pnrobinson closed 6 years ago

pnrobinson commented 6 years ago

yep, I am saying after doing a git pull then concatenating all the small files in the new v2 directory into the v2.tab (sans headers) see top of that gist

I am unexpectedly finding GO evidence codes in column 7 of the a few of the files (indicated by the OMIM term)

that is:

if I am in:

$ pwd

/home/tomc/Projects/Monarch/hpo-annotation-data/rare-diseases

and I look for the go code in the file for the disease

$ grep IEA annotated/v2files/OMIM-136610.tab OMIM:136610 #136610 FRAGILE SITE 2q11 HP:0001249 Intellectual disability IEA OMIM:136610 HPO:skoehler 2013-01-09

and the GO code is not in column 13 it is still in column 7

so why

git log annotated/v2files/OMIM-136610.tab commit 66bbe31c157008ef42d3ede138b7a37bd71b8b9a Author: pnrobinson peter.robinson@charite.de Date: Tue Feb 20 19:36:01 2018 -0500

 adding directory with v2 files

it has not been updated

git pull Already up-to-date.

but I have the most recent version

which indicates some small v2 files are not being updated or not being committed if they are updated on your end.

pnrobinson commented 6 years ago

This was apparently the result of some old-version files still in the repo. I have run everything fresh and the results look good.

$ cut -f1 phenotype.hpoa | sort | uniq
#DB
DECIPHER
OMIM
ORPHA

for the Qualifier:

cut -f4 phenotype.hpoa | sort | uniq

NOT
Qualifier

for evidence

$ cut -f7 phenotype.hpoa | sort | uniq
Evidence
ICE
IEA
PCS
TAS

for ionset:

$ cut -f8 phenotype.hpoa | sort | uniq

HP:0003577
HP:0003581
HP:0003584
HP:0003593
HP:0003596
HP:0003621
HP:0003623
HP:0003674
HP:0011461
HP:0011462
HP:0011463
Onset

A similar check for frequency showed that all items are in one of the three accepted formats For sex:

$ cut -f10 phenotype.hpoa | sort | uniq

Female
Male
Sex

for the new modifier terms

$ cut -f11 phenotype.hpoa | sort | uniq

HP:0003676
HP:0003831
HP:0011010
HP:0012825
HP:0012826
HP:0012827
HP:0012828
HP:0012829
HP:0012832
HP:0012833
HP:0012837
HP:0012839
HP:0012840
HP:0025303
HP:0030650
HP:0031375
HP:0031796
Modifier

for Aspect

$ cut -f12 phenotype.hpoa | sort | uniq
?
Aspect
C
I
P

THIS IS A PROBLEM, one aspect was not recognized! (and is a ? here) Dates -- manual check OK but see below

ssigned_By
HPO:curators
HPO:iea
HPO:lccarmody
HPO:nvasilevsky
HPO:probinson
HPO:sdoelken
HPO:skoehler
ORPHA:orphadata
PATOC:GVG; PATOC:PS
ZFIN:bruef; HPO:sdoelken
pnrobinson commented 6 years ago

There is also a problem with the dates still.