Open joqb opened 9 years ago
Hi Nath,
I run into the exact same problem as you while using fastStructure. I have a dataset with 48 individuals and 800 SNPs in a .str file. When I use the --cv option, I get a "Failed" message and without it only takes 2-3 seconds. Did you ever find out what was the issue?
Thanks, Laurène
Hi Laurene, The --cv option would make the software run slower (e.g., --cv = 5 would make it run 5 times slower, since it runs 5-fold cross-validation and reports ancestry proportions resulting from aggregating these 5 runs). However, I have not encountered the Failed error message before. Could you please copy-paste or provide a snap shot of the error? If you could share the dataset so I can replicate and fix the error, that would be really helpful!
thanks!
Hi Anil,
Thank you very much for your answer!
Since the software only takes 2 or 3 seconds to run on my dataset (48 ind, 800 SNPs) for each K, it would be no problem if the --cv option would make it run several times slower. My understanding of this option is that it's the number of replicates for each K, correct me if I'm wrong. The runs produce the same results (same output files) when I use the --cv option and when I don't use it, except in the .log file the last line says "CV error = 0.2362436, 0.0097023" and in the terminal it gives me several "Failed" messages: python ./structure.py -K 2 --input=structure --output=output/test --cv=3 --full --format=str Failed Failed Failed Failed Failed Failed Failed Failed Failed Failed Failed Failed Failed
Some people have reported the same problem before but I haven't seen any explanation or solution so far: https://groups.google.com/forum/#!search/faststructure$20cv/structure-software/cXyfoWXsOe4/Mix0Fo4nDAAJ
I carried out the analysis (without the --cv option), using chooseK.py and distruct.py and the final plot gives meaningful results, which are nearly identical to the results I got from the classic Structure software. Running fastStructure is much faster (which is the all purpose) but I would like to have replicates for each K (like in Structure) which would be then used by chooseK.py to choose the K more reliably.
I attached my input file so that you can have a look at the issue (I had to .zip it since github wouldn't accept a file with .str extension) structure.str.zip
Thank you very much for your time, Laurène
Hi Anil (and others),
I encountered the same error today using fastStructure v1.0 and the following command:
python /home/elinck/bin/fastStructure/structure.py -K 2 --input /home/elinck/atlapetes/atlapetes --output /home/elinck/atlapetes/atlapetes_output --format str --cv 3
My .str file is zipped and attached. Curious if you ever figured out what was causing the issue. Thanks in advance!
I'm also getting these errors. It looks like it could be from lines 293-305 of fastStructure.pyx? :
# test to ensure that for all partitions, the loci are all variant
newmasks = []
for mask in masks:
G = Gtrue.copy()
Gmask = -1*np.ones((N,L), dtype='int8')
Gmask[mask[0],mask[1]] = G[mask[0],mask[1]]
G[mask[0],mask[1]] = 3
if not (((G==1)+(G==2)).sum(0)==0).any():
newmasks.append(mask)
if not len(newmasks)>=cv:
wellmasked = False
print "Failed"
I do not have any invariant columns in my dataset, and I get the error even if I remove all tri-allelic sites from my input. I'm calling fastStructure as follows:
python fastStructure/structure.py -K 2 --input=inputFile --output=outputFile --cv=5 --format=str
I'm using Ubuntu 14.04.04 LTS, 64 bit.
I can confirm that I no longer get these errors if I convert my data to plink .bed format and remove any sites with over 90% missing data and minor allele frequencies greater than 99% or lower than 1%.
Hi, @atcg , I got the same error when I use plink .bed format as input! Failed Failed Failed Failed Failed Failed Failed Failed Failed Failed Failed Failed Failed Failed Failed
Has this been addressed? I am running into the same error. I'm using a plink bed file as the input file.
Hi there,
I'm trying fastStructure on a relatively small individuals dataset (25) but very large (10000 SNPs from GBS) in .str format. When I tried to run it with --cv=5, for I thought it would bring the same as running repetitions in the regular Structure, I only get FAILED {1,} to the screen and Structure keeps running. When I tried the same with the testdata it worked fine. Running on my data without --cv works also fine but is crazy fast with the simple prior (4 seconds which leaves me wondering...) but with the logistic prior it's much slower (didn't update the log file in an hour...)
Any suggestion?
Thanks, Nath