Closed adf-ncgr closed 8 years ago
The majority of features with both a 0 and non-0 featurepos.mappos values are linkage groups.
There are, however, 4 markers with 0 and non-0 featurepos.mappos values:
3248437 - BM152
positions: DOR364_x_BAT477_a-B02: 0, Cerinza_x_G24404_a-B02: 113.9,
lg lengths: Cerinza_x_G24404_a-B02: 118.8, DOR364_x_BAT477_a-B02: 92.2
--> might be because Cerinza_x_G24404_a-B02 is reversed relative to DOR364_x_BAT477_a-B02 ... or bad data.
3248502 - BMd28
position: DOR364_x_BAT477_a-B05: 0 , DOR364_x_BAT477_a-B10: 106.4 , Cerinza_x_G24404_a-B05: 63.0,
lg lengths: Cerinza_x_G24404_a-B05: 63.9, DOR364_x_BAT477_a-B10: 33.6 (??)
--> looks like bad data
3248542 - BM211
positions: Cerinza_x_G24404_a-B08: 0, DOR364_x_BAT477_a-B08: 59.7,
lg lengths: Cerinza_x_G24404_a-B08: 18.2, DOR364_x_BAT477_a-B08: 90.4
--> bad data?
3248561 - BM114,
positions: DOR364_x_BAT477_a-B09: 0, Cerinza_x_G24404_a-B09: 92.4,
lg lengths: Cerinza_x_G24404_a-B09: 129.9, DOR364_x_BAT477_a-B09: 134.7
--> bad data?
by ecannon
I see, I didn't quite understand the modeling of linkage groups as features in this way; I guess we
just got unlucky to have chosen BM152 (one of the 4 listed as potentially problematic) for the slides.
One question about the modeling of genetic data, thinking a little bit ahead to what we're learning about
the chado->intermine conversion: I can see now that featurepos is being used both for placing genetic markers
on linkage groups, as well as to place linkage groups within featuremaps and define their boundaries
(with respect to themselves as the map feature). On the other hand, it looks like QTLs are being handled as featurelocs, presumably so that fmin and fmax can be used together. Wondering if it would make sense to try to put everything "genetic" in the same context, by modeling QTLs as having 2 featurepos as is being done for linkage group boundaries? There's also the featurerange table which appears to be used for genetic entities defined by flanking features (which I guess is currently being modeled for QTLs as feature_relationships of the QTL to the flanking markers?) Alternatively, if it makes sense to stick with featureloc for QTLs, then maybe we should consider also putting marker positions there as well, in order to minimize the
convoluted logic for range-based queries on genetic maps?
NB: I don't want to cause unnecessary churn by suggesting changes that have no real benefit! but, I do think the current situation is a bit confusing (as is typical of chado) and possibly some downstream pain could be avoided by considering some modest changes (of course, it's not my code that would be affected, so this is quite easy for me to say!!)
by adf_ncgr
Marker positions are provided by two spreadsheets,
AsfawBlair2012_PhotosynthateAcquisitionRemobilizationDrought_G3_v07js.xslx and BlairGaleano2012_v22js.xslx.
The two sets of marker positions are completely different, and although the marker names substantially overlap, they are not identical in the two datasets.
by ecannon
Hmm. Yes, consistency is nice. The QTL were saved as featurepos records attached to featureposprop records of type 'start' and 'stop' but changed to the featureloc as that seemed silly. But if thinking of consistency of genetic vs genomic positions, it makes sense again. Will revisit when discussions on the new QTL data are revived in April.
by ecannon
The problems appear to trace to the map DOR364_x_BAT477_a (publication Blair, Galeano et al., 2012). Need to trace whether the map/marker data was duplicated in another publication incorrectly, if there is corrupt data lying around, or if there is an error in the loading script.
by ecannon
This was pretty nasty to track down. The problems (which were more extensive than expected) came down to a map-naming problem. There were two maps named DOR364_x_BAT477_a, in publications Blair, Galeano et al. 2012a, and Asfaw, Blair et al., 2012a. The latter map was renamed to DOR364_x_BAT477_b, but apparently neither publication was reloaded after the change.
Blair, Galeano et al. 2012a has been reloaded and its markers now appear to be correct.
While tracking this problem, found several markers with 0 and non-0 positions for different maps. These all came to either near-0 positions or linkage groups that were flipped relative to each other.
There may still be markers placed on different linkage groups. These are likely to be in the primary data but should be examined if found.
by ecannon
Relates to: GH-459
noticed this when reviewing Sudhansu's NAPIA slides, in which one marker had a position listed as 0 cM; while not impossible, it seemed odd, since the other map on which it was placed had
a more "normal" value. Just got around to querying the db and see that many (though not all) markers seem to have two positions on the same map, one of them having a mappos value of 0,
the other being "normal", e.g.:
drupal=> select * from featurepos;
featurepos_id | featuremap_id | feature_id | map_feature_id | mappos
--------------
--------------------------------------------1369 | 53 | 2558235 |2558235 | 0
1370 | 53 | 2558235 |2558235 | 200.46
1371 | 53 | 2558236 |2558236 | 0
1372 | 53 | 2558236 |2558236 | 156.31
1373 | 53 | 2558237 |2558237 | 0
1374 | 53 | 2558237 |2558237 | 194.45
1375 | 53 | 2558238 |2558238 | 0
1376 | 53 | 2558238 |2558238 | 130.35
etc.
seems likely to be a dataloading bug, but I'm only vaguely familiar with this process so will leave it as a conjecture...
[LEGUME-451] created by adf_ncgr