legumeinfo / mine-issues

Report ALL issues on LIS mines here! Regardless of which mine you found it on!
2 stars 0 forks source link

Duplicate genetic markers (in GlycineMine) #116

Closed sammyjava closed 1 year ago

sammyjava commented 1 year ago

Loaders are not merging on primaryIdentifier, apparently.

      primaryidentifier      |    name     | chromosomelocationid 
-----------------------------+-------------+----------------------
 glyma.Wm82.gnm1.ss715597374 | ss715597374 |            165000660
 glyma.Wm82.gnm1.ss715597374 | ss715597374 |            164109216
 glyma.Wm82.gnm2.ss715597374 | ss715597374 |            177051497
 glyma.Wm82.gnm2.ss715597374 | ss715597374 |                     
 glyma.Wm82.gnm2.ss715597374 | ss715597374 |            178002443

To be fixed as part of 5.0.1.3.

sammyjava commented 1 year ago

So, I had keyed on primaryIdentifier,genotypingPlatform which made the "same" marker duplicated. Here is an example:

glyma.Wm82.gnm2.Gm07    BARCSoy6K   genetic_marker  36036751    36036751    .   .   .   ID=glyma.Wm82.gnm2.ss715597374;Name=ss715597374;alleles=T/G
glyma.Wm82.gnm2.Gm07    BARCSoy50K  genetic_marker  36036751    36036751    .   .   .   ID=glyma.Wm82.gnm2.ss715597374;Name=ss715597374;alleles=T/G
glyma.Wm82.gnm2.Chr07   PMID:31031779   genetic_marker  36036751    36036751    .   .   .   ID=glyma.Wm82.gnm2.ss715597374;Name=ss715597374

I'll change GeneticMarker.genotypingPlatform to a collection genotypingPlatforms and merge on primary identifier. This may have an effect on other loaders.

sammyjava commented 1 year ago

(Note that PMID:31031779 does not match Tran_Steketee_2019, the collection's name, but I don't care.)

sammyjava commented 1 year ago

This appears to be fixed and tested in current 5.1.0.3.