Closed marce-sarrias closed 1 year ago
Hello,
it seems to be something happening with ANGSD and not ngsTools, and therefore I can't provide much support. I'd say that it is an issue of parsing your input files.
Also, we recommend using PCAngsd for PCA-related analysis as it has been shown to outperform ngsCovar under many scenarios.
Matteo
From: marce-sarrias @.> Sent: 17 May 2023 8:39 AM To: mfumagalli/ngsTools @.> Cc: Subscribed @.***> Subject: [mfumagalli/ngsTools] cannot convert vcf file to beagle binary format using the -vcf-gl option (Issue #32)
Hello,
I did SNP calling with GATK (HaplotypeCaller, CombineGVCFs and GenotypeGVCFs) and as a result I have a final with all snps after applying several filters.
Now I would like to use this matrix of snps to study population structure (do a PCA with ngsCovar) and individual admixture proportions (with NGSadmix). It is my understanding that in order to do this I need to convert my vcf file into beagle binary format, which I could do in angsd using the -vcf-gl option.
I am able to run the command (see below) without getting errors but it seems it is not generating the correct output file.
ANGSD=/path to angsd
REF=genome.fasta.fai
$ANGSD -vcf-gl ./snps.vcf.gz -GL 0 -out ./snps.beagle -fai $REF -doGlf 2 -doMajorMinor 1 -doMaf 1 -nind 43
This is what I see when I check the run log file:
BAD SITE chr24:21289418. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289424. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289425. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289431. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289433. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289470. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289475. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289525. return code:-1 while fetching GL tag rec->rid:23 -> Done reading data waiting for calculations to finish -> Done waiting for threads -> Output filenames: ->"./snps.beagle.arg" ->"./snps.beagle.beagle.gz" ->"./snps.beagle.mafs.gz" -> Mon May 15 15:43:02 2023 -> Arguments and parameters for all analysis are located in .arg file -> Total number of sites analyzed: 0 -> Number of sites retained after filtering: 0 [ALL done] cpu-time used = 134.35 sec [ALL done] walltime used = 135.00 sec
I'm not really sure what the problem is and how to fix it. Any advice on this?
Thank you!!
— Reply to this email directly, view it on GitHubhttps://github.com/mfumagalli/ngsTools/issues/32, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQ26COF7HRGC272ZHORCILXGR6DPANCNFSM6AAAAAAYEVVXKU. You are receiving this because you are subscribed to this thread.Message ID: @.***>
I understand, thank you. Is it possible to run ngsTools using a vcf file? If not, could you advice on how to convert my vcf file into a beagle binary file? Thanks again!
It should be possible to use VCF files in ngsTools as long as they are properly converted to a suitable file containing genotype likelihoods, but I don't have much experience doing that and thus I cannot provide much support.
From: marce-sarrias @.> Sent: 17 May 2023 11:06 AM To: mfumagalli/ngsTools @.> Cc: Matteo Fumagalli @.>; Comment @.> Subject: Re: [mfumagalli/ngsTools] cannot convert vcf file to beagle binary format using the -vcf-gl option (Issue #32)
I understand, thank you. Is it possible to run ngsTools using a vcf file? If not, could you advice on how to convert my vcf file into a beagle binary file? Thanks again!
— Reply to this email directly, view it on GitHubhttps://github.com/mfumagalli/ngsTools/issues/32#issuecomment-1551112740, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQ26CJOCSTSIDVPHV5EZUDXGSPLBANCNFSM6AAAAAAYEVVXKU. You are receiving this because you commented.Message ID: @.***>
closing as it is not related to ngsTools but ANGSD
Hello,
I did SNP calling with GATK (HaplotypeCaller, CombineGVCFs and GenotypeGVCFs) and as a result I have a final with all snps after applying several filters.
Now I would like to use this matrix of snps to study population structure (do a PCA with ngsCovar) and individual admixture proportions (with NGSadmix). It is my understanding that in order to do this I need to convert my vcf file into beagle binary format, which I could do in angsd using the -vcf-gl option.
I am able to run the command (see below) without getting errors but it seems it is not generating the correct output file.
ANGSD=/path to angsd
REF=genome.fasta.fai
$ANGSD -vcf-gl ./snps.vcf.gz -GL 0 -out ./snps.beagle -fai $REF -doGlf 2 -doMajorMinor 1 -doMaf 1 -nind 43
This is what I see when I check the run log file:
BAD SITE chr24:21289418. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289424. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289425. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289431. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289433. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289470. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289475. return code:-1 while fetching GL tag rec->rid:23 BAD SITE chr24:21289525. return code:-1 while fetching GL tag rec->rid:23 -> Done reading data waiting for calculations to finish -> Done waiting for threads -> Output filenames: ->"./snps.beagle.arg" ->"./snps.beagle.beagle.gz" ->"./snps.beagle.mafs.gz" -> Mon May 15 15:43:02 2023 -> Arguments and parameters for all analysis are located in .arg file -> Total number of sites analyzed: 0 -> Number of sites retained after filtering: 0 [ALL done] cpu-time used = 134.35 sec [ALL done] walltime used = 135.00 sec
I'm not really sure what the problem is and how to fix it. Any advice on this?
Thank you!!