Closed stsmall closed 5 years ago
I tested my model on your example data and there were no 'nan', of course it is improper classification but seems to behave. I tested your model on my data and there were the 'nan' again. Hmm, not sure what is going on, but I will try to rebuild the fvec from the vcf and predict.
I recalc stats using fvecVcf for a short segment and it works! No idea what was wrong the first time as the fvecVcf looks the same.
awesome. glad it's working
Hi @andrewkern, The 'nan' error was fixed when I only allowed sites with no missing data. Is this expected behavior and I just overlooked the documentation, or is my error likely related to something else? thanks!
Hi @andrewkern, @dschride I thought that removing the missing data fixed the issue with nan probs. My mistake is that I was sampling from the file as it was building and then running tests. The odd behavior is that if I subsample the fvecVcf file, e.g., head -n100, it works as expected with probs, but if I attempt to run on the entire Chr arm file it returns 'nan' for all probs. Maybe there is a problematic line in the fvecVcf output? Can I send you my fvecVcf file via email? thank you, @stsmall
Sure, email me your output.
Okay, so you have a few nans in there. If you remove the following lines I bet it would work:
3R 24627501 24632500 3R 24632501 24637500 3R 24637501 24642500
It seems that in these lines most of our stats based on diploid genotype strings (i.e. our "diplotypes" in the paper), which I don't think should happen unless there is a fairly small number of polymorphisms in those windows and thus you wouldn't be losing anything informative by throwing them out anyway. But you may wish to verify this before proceeding.
yep, that was it. Thanks @dschride !!
quickly ... it seems that linkedSoft is spelled as likedSoft in the output
Hi @andrewkern, I ran through the test example and all worked as expected. So the program is working correctly. When I run the 'predict' step I get 'nan' for my probs. I did not get errors anywhere else in the pipeline.
3R 97501 102500 75001-130000 hard nan nan nan nan nan 3R 102501 107500 80001-135000 hard nan nan nan nan nan 3R 107501 112500 85001-140000 hard nan nan nan nan nan
I realize that I did not give you much information, what would be helpful? I can email the json and hdf5 files if it would help. thank you! @stsmall