Closed oushujun closed 5 years ago
this looks like a bug that was recently introduced to the code. hold on and we will fix it.
Hi @andrewkern , @oushujun, I just cloned the repo and setup. Still getting the same error when I invoke --maskFileName and --chrArmsForMasking on the example data. Is there a work around? thanks, @stsmall
this was code that @dschride had changed recently but i don't believe he has pushed his patch to github. @dschride did you push the new masking version?
I used the buggy version and not supplying any masking to the simulated data - the genome I used is very good and only contains very limited Ns and physical gaps, so I figure not making would not be too big a problem.
Shujun
On Wed, Nov 21, 2018, 7:16 PM Andrew Kern <notifications@github.com wrote:
this was code that @dschride https://github.com/dschride had changed recently but i don't believe he has pushed his patch to github. @dschride https://github.com/dschride did you push the new masking version?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kern-lab/diploSHIC/issues/9#issuecomment-440859957, or mute the thread https://github.com/notifications/unsubscribe-auth/AFt-NPIx_apXTLj0O9qM9J2hSJ_qLcnhks5uxezlgaJpZM4WZWnq .
Shujun, please pull the latest version and try again. The cause of this issue is that diploid mode did not support masking sites without also masking genotypes, but I have added this functionality. Try running again and the same manner and let me know if this issue is resolved.
Thank you @dschride! After this update the example data finishes without error. My data using diploid and a mask file also finished without error.
No problem!
I am having a similar problem
Ran the following:
python diploSHIC.py fvecSim diploid sims/TEST.ms sims/TEST.fvec --totalPhysLen 110000
Got this error:
/anaconda3/bin/python makeFeatureVecsForSingleMsDiploid.py sims/TEST.ms 110000 11 None None None None 0.75 all 0.25 None sims/TEST.fvec file name='sims/TEST.ms'Traceback (most recent call last): File "makeFeatureVecsForSingleMsDiploid.py", line 17, in <module> trainingDataFileObj, sampleSize, numInstances = openMsOutFileForSequentialReading(trainingDataFileName) File "/diploSHIC/msTools.py", line 150, in openMsOutFileForSequentialReading program, numSamples, numSims = header.strip().split()[:3] ValueError: not enough values to unpack (expected 3, got 1)
But I don't see anything wrong with my header do you? Here is the .ms file in question: TEST.ms.txt
In the file you have attached I don't see a header line, or the random seed line which would typically appear right below it in ms-style output. For example, if I run the following command using ms:
ms 10 1 -t 1
My output will look something like this:
ms 10 1 -t 1
11048 49753 20103
//
segsites: 3
positions: 0.0447 0.0800 0.2977
111
000
110
110
110
000
110
110
000
110
However, your file starts with:
//
segsites: 8721
Some of the information in the header line is needed by diploSHIC (the sample size for each simulation and the number of simulated replicates) while other information (the path to the simulation program and additional command line arguments, and the entirety of the random seed line) are not explicitly read by diploSHIC but it does expect them to be there for proper parsing. If I modify the beginning of your simulation output file to the following then the fvecSim command runs properly:
blah 100 1
blah
//
segsites: 8721
Fixed it, thanks.
CentOs 7 Python 3.6.6 :: Anaconda, Inc.
I was testing
fvecSim
using the mosquito data and found a bug:python diploSHIC.py fvecSim diploid hard_0.msOut.gz test_hard_0.diploid.fvec --totalPhysLen 55000 --maskFileName Anopheles-gambiae-PEST_CHROMOSOMES_AgamP3.accessible.fa.gz --chrArmsForMasking 3R
Program output:
file name='hard_0.msOut.gz'vcfForMaskFileName='None': not masking any genotypes! reading masking data...reading Anopheles-gambiae-PEST_CHROMOSOMES_AgamP3.accessible.fa.gz checked genotypes at 0 sites Traceback (most recent call last): File "/opt/software/SHIC/diploSHIC/makeFeatureVecsForSingleMsDiploid.py", line 64, in <module> sampleToPopFileName=sampleToPopFileName) ValueError: too many values to unpack (expected 2) /opt/software/miniconda/4.4.10--GCC-4.9.4/bin/python /opt/software/SHIC/diploSHIC/makeFeatureVecsForSingleMsDiploid.py hard_0.msOut.gz 55000 11 Anopheles-gambiae-PEST_CHROMOSOMES_AgamP3.accessible.fa.gz None None None 0.75 3R 0.25 None test_hard_0.diploid.fvec
When removing the
--maskFileName
and--chrArmsForMasking
parameters, it runs fine.BTW, is the program designed to sample the provided mask file randomly (or sequencially?) to mimic true data?
Thanks, Shujun