thlee / SNPhylo

A pipeline to generate a phylogenetic tree from huge SNP data
http://chibba.pgml.uga.edu/snphylo/
GNU General Public License v2.0
83 stars 37 forks source link

Warning: all SNP positions do not have DP information. #4

Open wcudude14 opened 10 years ago

wcudude14 commented 10 years ago

Hi I am trying to use SNPhylo, but I am having issues with the format of my VCF-file and the way the software is sorting the columns. I am getting a Warning that all SNP positions do not have DP information.

Start to remove low quality data. Warning: There were 24732 SNP positions which did not have DP information. Warning: 24732 SNPs were passed without the read depth assessment because of the absence of the DP column. 0 low quality lines were removed. SNPRelate -- Supported by Streaming SIMD Extensions 2 (SSE2) Error in snpgdsVCF2GDS(vcf.file, gds.file, method = "biallelic.only", : unused argument (option = snpgds.option) Execution halted

However, when I look at the VCF-file, there is DP information. It looks like the SNPhylo script _remove_low_depth_genotype_data.py_ splits the INFO string into "columns" using a colon:

line 45 format_col = vcf_data[8].split(':')

Unfortunately, the INFO string in my VCF-files uses a semi-colon, not a colon, as a separator. for example: DP=25;VDB=0.0269;AF1=1;AC1=2;DP4=0,0,9,15;MQ=60;FQ=-72

Some of my files contain more than 25k lines of SNP data, what is the best way to approach this issue?

Thank you in advance for any help and advice you can share

thlee commented 10 years ago

Thank you for your comment.

I will reflect your comment in the next version of SNPhylo. :)

Thank you,

Tae-Ho Lee

thlee commented 9 years ago

A script, "generate_snp_sequence.R", was revised in order to solve the error "Error in snpgdsVCF2GDS(vcf.file, gds.file, method = "biallelic.only", : unused argument (option = snpgds.option)". I hope this is not too late.

Best,

nitishnarula commented 9 years ago

Hi,

I am getting the same error but I am not sure if it is the same issue. The output is on the bottom (I have removed some of the lines about usage information). In my VCF file the format column only has GT values, but the info column has NS and DP information. Does this mean that I need DP information in the format column?

I have tried changing the -m, -M, -p and -c options but I am not sure what would really work in my case, if indeed those options need to be changed. Any help would much appreciated!

Thanks Nitish


Start to remove low quality data.

Warning: There were 43939 SNP positions which did not have DP information.

Warning: 215752 SNPs were passed without the read depth assessment because of the absence of the DP column.

43938 low quality lines were removed.

Determine phylogenetic tree based on SNP data with a VCF, a HapMap, a Simple SNP or a GDS file

Version: 20140701

Usage: .... Error: There are too small number of SNP data in the file (snphylo.output.filtered.vcf)! Please restart this script with different parameter values (-p and/or -c).

cgrunau commented 8 years ago

Hi, I installed snphylo today (Apr. 21, 2016) from via curl -O http://chibba.pgml.uga.edu/snphylo/snphylo.tar.gz No problems with installation. I am on MacOS 10.10.5, R v 3.2.4 My input is a VCF file merged with VCFTools. Same error-message as above:

snphylo.sh -v Merged_B0_B2_contig_nr.vcf -b -A -p 25 -c 2 Start to remove low quality data. Warning: There were 5596707 SNP positions which did not have DP information. Warning: 11283584 SNPs were passed without the read depth assessment because of the absence of the DP column. 5333403 low quality lines were removed.

SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2) Error in snpgdsVCF2GDS(vcf.file, gds.file, method = "biallelic.only", : unused argument (option = snpgds.option) Execution halted

I also had the INFO strings separated by ; . I replaced them by : but that did not change anything. Does anybody has found a solution for the Error in snpgdsVCF2GDS(vcf.file, gds.file, method = "biallelic.only", : unused argument (option = snpgds.option) problem in snphylo?

Many thanks - Christoph

cgrunau commented 8 years ago

I manually replaced generate_snp_sequence.R with generate_snp_sequence.R from https://raw.githubusercontent.com/thlee/SNPhylo/cdc6125fe164d8e3f37783c93dcd7fbc508dd727/scripts/generate_snp_sequence.R That solved the issue. Maybe a good idea to include the new version of generate_snp_sequence.R into the distribution ;-) Best - Christoph