Open jd-campbell opened 6 months ago
@jd-campbell not %100 sure but this sounds potentially related to some other issues that I'm guessing may stem from scripts that Sam had written to generate the files for soybean QTLs from the info in the soybase mysql. At present I have no clue as to where those scripts may be but will send a flare up to Sam and see if he has any recollection of where he might have put them.
Sam was super-fast and helpful in his response. The scripts are here: https://github.com/sammyjava/SoyBase He did say that the direct outputs were subjected to ad hoc munging due to naming conflicts and the like, but seems like a good place to start (provided I can actually figure out how to run the scripts, which he said require some ssh tunneling to the mysql db). Anyway, this may also be relevant for #205 so I'll hopefully be able to make some headway on it.
@adf-ncgr Thanks for the info. This helps in my work. Please send my thanks to Sam also!
@jd-campbell not sure this one is quite ready to be closed, but here's an update. I got Sam's code to run and for this dataset it seems to have produced 26 QTLs, although one of them (mqCanopy wilt-013) looks like it may be problematic without location info:
mqCanopy wilt-014 Canopy wilt GmComposite2003_D1b 50.11 52.61 51.36
mqCanopy wilt-019 Canopy wilt GmComposite2003_A1 0.98 2.98 1.98
mqCanopy wilt-021 Canopy wilt GmComposite2003_D2 46.8 48.8 47.8
mqCanopy wilt-013 Canopy wilt
mqCanopy wilt-008 Canopy wilt GmComposite2003_A1 16.16 18.16 17.16
mqCanopy wilt-012 Canopy wilt GmComposite2003_D2 124.0 126.0 125.0
mqCanopy wilt-015 Canopy wilt GmComposite2003_D2 114.97 124.02 119.5
mqCanopy wilt-007 Canopy wilt GmComposite2003_A1 2.54 4.54 3.54
mqCanopy wilt-023 Canopy wilt GmComposite2003_D1b 47.69 49.69 48.69
mqCanopy wilt-005 Canopy wilt GmComposite2003_D1b 83.04 85.04 84.04
mqCanopy wilt-011 Canopy wilt GmComposite2003_D2 56.07 58.07 57.07
mqCanopy wilt-022 Canopy wilt GmComposite2003_D1b 33.42 35.42 34.42
mqCanopy wilt-016 Canopy wilt GmComposite2003_D1b 3.79 6.54 5.17
mqCanopy wilt-026 Canopy wilt GmComposite2003_L 47.2 49.2 48.2
mqCanopy wilt-002 Canopy wilt GmComposite2003_D1b 0.0 1.0 0.5
mqCanopy wilt-017 Canopy wilt GmComposite2003_D1b 4.51 6.51 5.51
mqCanopy wilt-024 Canopy wilt GmComposite2003_B1 33.25 35.25 34.25
mqCanopy wilt-006 Canopy wilt GmComposite2003_D2 51.4 53.4 52.4
mqCanopy wilt-010 Canopy wilt GmComposite2003_B1 54.8 56.8 55.8
mqCanopy wilt-027 Canopy wilt GmComposite2003_D2 125.5 127.5 126.5
mqCanopy wilt-001 Canopy wilt GmComposite2003_D1b 11.58 13.58 12.58
mqCanopy wilt-009 Canopy wilt GmComposite2003_B1 75.1 77.1 76.1
mqCanopy wilt-003 Canopy wilt GmComposite2003_D1b 51.61 53.61 52.61
mqCanopy wilt-020 Canopy wilt GmComposite2003_B1 64.82 85.59 75.21
mqCanopy wilt-018 Canopy wilt GmComposite2003_D1b 84.04 85.59 84.82
mqCanopy wilt-025 Canopy wilt GmComposite2003_L 81.9 83.9 82.9
In any case, I'm not sure why the datastore file would only have 2 QTLs since this one seems more complete (though maybe still not entirely complete?). I'll try to explore a little more but wanted to let you know there's at least some progress on this.
OK, it looks like the issue with that one QTL without location info is probably a data issue, and not the fault of the code. mqCanopy wilt-013 is one of ~40 QTLs without an entry in the qtl_position_table :
select QTLID, QTLName from qtl_table where QTLID not in (select QTLID from qtl_position_table);
+-------+-----------------------------+
| QTLID | QTLName |
+-------+-----------------------------+
| 18 | Chlorimuron sensitivity 1-4 |
| 19 | Chlorimuron sensitivity 1-5 |
| 20 | Chlorimuron sensitivity 1-6 |
| 27 | Chlorimuron sensitivity 2-2 |
| 1208 | cqSeed protein-002 |
| 72 | Fe effic 2-1 |
| 163 | Leaflet ash 1-6 |
| 1410 | Leaflet shape 9-5 |
| 175 | Lodging 4-1 |
| 4291 | mqCanopy wilt-013 |
| 440 | Plant height 11-4 |
| 4072 | Plant height 37-7 |
| 393 | Plant height 4-3 |
| 408 | Plant height 5-14 |
| 418 | Plant height 6-10 |
| 421 | Plant height 6-13 |
| 417 | Plant height 6-9 |
| 425 | Plant height 7-3 |
| 464 | Pod dehiscence 1-11 |
| 465 | Pod dehiscence 1-12 |
| 2534 | Sclero 8-4 |
| 736 | SCN 10-2 |
| 732 | SCN 9-4 |
| 733 | SCN 9-5 |
| 950 | SDS 8-4 |
| 554 | Seed protein 5-5 |
| 555 | Seed protein 5-6 |
| 976 | Seed sucrose 1-11 |
| 977 | Seed sucrose 1-12 |
| 979 | Seed sucrose 1-14 |
| 980 | Seed sucrose 1-15 |
| 981 | Seed sucrose 1-16 |
| 982 | Seed sucrose 1-17 |
| 826 | Seed weight 3-7 |
| 828 | Seed weight 3-9 |
| 1193 | Seed yield 15-14 |
| 894 | Seed yield 3-3 |
| 965 | Stem length, main 1-1 |
+-------+-----------------------------+
38 rows in set (0.0682 sec)
Also note that the db seems to have only 26 not 27 QTLs (at least, per select count(*) from qtl_table where QTLName like 'mqCanopy wilt%'), so I think the version I got out of running the code is probably close to correct. Let me know if you think that one QTL missing a position can be fixed in the db, otherwise I'll just replace the datastore file with the new one.
@adf-ncgr @jd-campbell Since the paper says that mqCanopy wilt-013 (QTL name 5-2) is only associated with Satt229, the position values should be 92.88 94.88 93.88. That is 1 cM on each side of Satt229 which the database says is at 93.88 on LG L or Gm19. I am not sure why it was left out, but this record was problematic and had to be adjusted after the data was originally entered by an undergrad student worker. I have inserted mqCanopy wilt-013 into both stage and production MySQL databases.
I have noticed that there is missing data in the mixed.qtl.Hwang_King_2016 directory. The *qtl.tsv file only contains 2 QTLs but the SoyBase MySQL database lists 27 QTLs.
@jd-campbell Will review the paper and SoyBase MySQL to ensure all the data is in the DS.