It seems that handling "ms" format requires that mutation positions are within [0, 1), but that this is not checked. This can lead to problems when simulations allow for positions to be outside that range, which is common in forward sims.
MRE is below. I added print(newPositions) at line 54 of msTools.py to show what happens.
/usr/bin/python3 makeFeatureVecsForSingleMs_ogSHIC.py out.ms 1100000 11 None None all 0.25 0.0 None stats
file name='out.ms'maskFileName='None': not doing any masking!
makeFeatureVecsForSingleMs_ogSHIC.py:108: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
start = time.clock()
[6600, 103400, 112200, 146300, 333300, 372900, 421300, 421301, 468600, 484000, 591800, 619299, 652300, 664400, 733700, 765600, 771100, 848100, 910800, 931700]
makeFeatureVecsForSingleMs_ogSHIC.py:209: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
time.clock()-start))
total time spent calculating summary statistics and generating feature vectors: 0.845340 secs
NOTE the range of mutation positions!!! Keep that in mind for later
python3 diploSHIC.py fvecSim haploid out2.ms stats2
/usr/bin/python3 makeFeatureVecsForSingleMs_ogSHIC.py out2.ms 1100000 11 None None all 0.25 0.0 None stats2
file name='out2.ms'maskFileName='None': not doing any masking!
makeFeatureVecsForSingleMs_ogSHIC.py:108: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
start = time.clock()
[1099981, 1099982, 1099983, 1099984, 1099985, 1099986, 1099987, 1099988, 1099989, 1099990, 1099991, 1099992, 1099993, 1099994, 1099995, 1099996, 1099997, 1099998, 1099999, 1100000]
makeFeatureVecsForSingleMs_ogSHIC.py:209: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
time.clock()-start))
total time spent calculating summary statistics and generating feature vectors: 0.112435 secs
**NOTE that the positions are all "bunched up" now on the right!!!!***
It seems that handling "ms" format requires that mutation positions are within
[0, 1)
, but that this is not checked. This can lead to problems when simulations allow for positions to be outside that range, which is common in forward sims.MRE is below. I added
print(newPositions)
at line 54 ofmsTools.py
to show what happens.Here is data set 1:
Getting features from it:
NOTE the range of mutation positions!!! Keep that in mind for later
The resulting stats:
Now, take the same exact data, but change mutations to be on
[1, 2)
:Get the stats:
**NOTE that the positions are all "bunched up" now on the right!!!!***
The stats are very different:
cc @khoihuynh-thorntonlab