So we did the update to not reproduce the genetic map (listed in the README) as a prefix of all the LG identifiers (so they look like they did in the publications). This was heavily motivated by Phaseolus work I was doing (as I recall) but seems like a good idea in the "make things recognizable from the publication" approach. Of course I was thinking that QTL studies use a single genetic map.
But, of course, now I'm updating the Glycine QTL studies and maps, and I have 52 QTL studies with multiple genetic maps like this:
a study that places QTLs on three different genetic maps (this is a previous-style file with the LG prefixes):
#qtl_identifier trait_name linkage_group start end peak
Pod dehiscence 1-9 Pod dehiscence GmRFLP-GA1996a_J.2 9.25 18.75 14.0
Pod dehiscence 1-1 Pod dehiscence GmComposite1999_E 92.3 94.3 93.0
Pod dehiscence 1-8 Pod dehiscence GmRFLP-GA1996a_J.2 0.0 18.5 9.0
Pod dehiscence 1-3 Pod dehiscence GmComposite1999_E 90.2 92.2 91.0
Pod dehiscence 1-2 Pod dehiscence GmComposite1999_E 94.3 96.3 95.0
Pod dehiscence 1-10 Pod dehiscence GmComposite1999_L 109.5 111.5 111.0
Pod dehiscence 1-4 Pod dehiscence GmComposite1999_E 93.2 95.2 94.0
Pod dehiscence 1-7 Pod dehiscence GmComposite2003_J 56.2 58.2 57.0
Pod dehiscence 1-6 Pod dehiscence GmComposite2003_J 26.63 28.63 28.0
Pod dehiscence 1-5 Pod dehiscence GmComposite2003_J 16.35 18.35 17.0
Definitely qualifies for the "funky" label.
There are 52 Glycine QTL studies with multiple maps, 261 that do not, imported from the soybase mysql.
My proposed solution is to simply add a genetic_map column to the qtl.tsv files (so we know which map those LGs are on, since we don't know from the genetic_map: attribute in the README) and the qtlmrk.tsv files (so we know on which map the markers were placed to determine the QTL):
qtl.tsv
#qtl_identifier trait_name genetic_map linkage_group start end peak
Pod dehiscence 1-9 Pod dehiscence GmRFLP-GA1996a J.2 9.25 18.75 14.0
Pod dehiscence 1-1 Pod dehiscence GmComposite1999 E 92.3 94.3 93.0
Pod dehiscence 1-8 Pod dehiscence GmRFLP-GA1996a J.2 0.0 18.5 9.0
Pod dehiscence 1-3 Pod dehiscence GmComposite1999 E 90.2 92.2 91.0
Pod dehiscence 1-2 Pod dehiscence GmComposite1999 E 94.3 96.3 95.0
Pod dehiscence 1-10 Pod dehiscence GmComposite1999 L 109.5 111.5 111.0
Pod dehiscence 1-4 Pod dehiscence GmComposite1999 E 93.2 95.2 94.0
Pod dehiscence 1-7 Pod dehiscence GmComposite2003 J 56.2 58.2 57.0
Pod dehiscence 1-6 Pod dehiscence GmComposite2003 J 26.63 28.63 28.0
Pod dehiscence 1-5 Pod dehiscence GmComposite2003 J 16.35 18.35 17.0
qltmrk.tsv
#qtl_identifier trait_name marker genetic_map linkage_group
Pod dehiscence 1-1 Pod dehiscence BLT049_5 GmComposite1999 E
Pod dehiscence 1-2 Pod dehiscence cr324_1 GmComposite1999 E
Pod dehiscence 1-3 Pod dehiscence B124_3 GmComposite1999 E
Pod dehiscence 1-4 Pod dehiscence cr274_1 GmComposite1999 E
Pod dehiscence 1-5 Pod dehiscence B074_1 GmComposite2003 J
Pod dehiscence 1-6 Pod dehiscence B166_1 GmComposite2003 J
Pod dehiscence 1-7 Pod dehiscence B122_1 GmComposite2003 J
Pod dehiscence 1-8 Pod dehiscence K375_1 GmRFLP-GA1996a J.2
Pod dehiscence 1-9 Pod dehiscence cr392_1 GmRFLP-GA1996a J.2
Pod dehiscence 1-10 Pod dehiscence A489_1 GmComposite1999 L
This may be only a Glycine issue, but it seems worthwhile to support multi-map QTL studies.
Note that this is a file format and loader change; the database model does NOT change since LGs and markers are already associated with a genetic map.
It's effectively what we had before, but with the genetic map being a column rather than an LG identifier prefix. (And super easy to implement as soybase always uses the underscore separator!)
So we did the update to not reproduce the genetic map (listed in the README) as a prefix of all the LG identifiers (so they look like they did in the publications). This was heavily motivated by Phaseolus work I was doing (as I recall) but seems like a good idea in the "make things recognizable from the publication" approach. Of course I was thinking that QTL studies use a single genetic map.
But, of course, now I'm updating the Glycine QTL studies and maps, and I have 52 QTL studies with multiple genetic maps like this:
Young_x_PI416937.qtl.Bailey_Mian_1997 genetic_map: GmComposite2003,GmRFLP-GA1996a,GmComposite1999
a study that places QTLs on three different genetic maps (this is a previous-style file with the LG prefixes):
Definitely qualifies for the "funky" label.
There are 52 Glycine QTL studies with multiple maps, 261 that do not, imported from the soybase mysql.
My proposed solution is to simply add a genetic_map column to the qtl.tsv files (so we know which map those LGs are on, since we don't know from the genetic_map: attribute in the README) and the qtlmrk.tsv files (so we know on which map the markers were placed to determine the QTL): qtl.tsv
qltmrk.tsv
This may be only a Glycine issue, but it seems worthwhile to support multi-map QTL studies.
Note that this is a file format and loader change; the database model does NOT change since LGs and markers are already associated with a genetic map.
It's effectively what we had before, but with the genetic map being a column rather than an LG identifier prefix. (And super easy to implement as soybase always uses the underscore separator!)