legumeinfo / datastore-specifications

Specifications for directory naming, file naming, file contents in the LIS datastore
2 stars 0 forks source link

What's up with the .id_map files? #31

Closed sammyjava closed 1 month ago

sammyjava commented 1 year ago

Whenever a new file type appears in the DS I'll post up an issue, since it likely is not in the specification. This has happened with .id_map files:

-rw-rw-r--  1 adf  staff    52M Oct  5 13:56 culinaris/annotations/CDC_Redberry.gnm2.ann1.5FB4/lencu.CDC_Redberry.gnm2.ann1.5FB4.id_map
-rw-rw-r--  1 adf  staff   130K Oct  5 13:53 culinaris/genomes/CDC_Redberry.gnm2.7C5P/lencu.CDC_Redberry.gnm2.7C5P.id_map
-rw-rw-r--  1 adf  staff    53M Oct  5 14:27 ervoides/annotations/IG_72815.gnm1.ann1.R90F/lener.IG_72815.gnm1.ann1.R90F.id_map
-rw-rw-r--  1 adf  staff    67K Oct  5 13:52 ervoides/genomes/IG_72815.gnm1.ZDWF/lener.IG_72815.gnm1.ZDWF.id_map

Do these need to be in the datastore?

adf-ncgr commented 1 year ago

Yes, I will update the specs. These are just maps from the original ids to the LIS ids; they may often be trivial (just adding the full yuck prefixing), but I figured we might decide that having: Lcu.2RBY.1g000010 -> lencu.CDC_Redberry.gnm2.ann1.1g000010 would be better than Lcu.2RBY.1g000010 -> lencu.CDC_Redberry.gnm2.ann1.Lcu.2RBY.1g000010 although I did not actually do so in this case (having meant to RFO it and then getting distracted). But more and more groups are starting to add their own flavors of yuck, and the id_maps can be updated and used to re-transform the files programatically if we decide to start making such aesthetic decisions.

sammyjava commented 1 year ago

I think I support dots in feature names, but we'll find out with this build. I just looked at the method that extracts the secondaryIdentifier from a primaryIdentifier and it does.

StevenCannon-USDA commented 1 year ago

I've added specifications for featid_map.tsv and seqid_map.tsv, in Genus/species/annotations and Genus/species/genomes

Currently, we have examples here:

  Vigna/radiata/annotations/VC1973A.gnm7.ann1.RWBG/vigra.VC1973A.gnm7.ann1.RWBG.featid_map.tsv.gz
  Vigna/radiata/genomes/VC1973A.gnm7.SB53/vigra.VC1973A.gnm7.SB53.seqid_map.tsv.gz
  Glycine/max/annotations/Wm82_ISU01.gnm2.ann1.FGFB/glyma.Wm82_ISU01.gnm2.ann1.FGFB.featid_map.tsv.gz
  Glycine/max/genomes/Wm82_ISU01.gnm2.JFPQ/glyma.Wm82_ISU01.gnm2.JFPQ.seqid_map.tsv.gz