thlee / SNPhylo

A pipeline to generate a phylogenetic tree from huge SNP data
http://chibba.pgml.uga.edu/snphylo/
GNU General Public License v2.0
83 stars 37 forks source link

Error: The length of sequence is too long (> 50000 bp) to construct a tree! #23

Open Yoko-Hira opened 7 years ago

Yoko-Hira commented 7 years ago

bash snphylo.sh -H FINAL-ADMIXTURE-CVC880-MandSatsumaClem-CVC97-TASSEL-Samplename.hapmap.txt -b -a 9 -l 0.2 -m 0.1 -M 0.1 -P CVC97 Start to remove low quality data.

91 low quality lines were removed

SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2) Start HapMap2GDS ... Scanning ... file: CVC97.filtered.hapmap content: 51206 rows x 108 columns Mon Oct 9 15:01:17 2017 store sample id, snp id, position, and chromosome. start writing: 97 samples, 51205 SNPs ... file: CVC97.filtered.hapmap Mon Oct 9 15:02:13 2017 Done. Hint: it is suggested to call snpgdsOpen' to open a SNP GDS file instead ofopenfn.gds'. SNP pruning based on LD: Excluding 0 SNP on non-autosomes Excluding 51,205 SNPs (monomorphic: TRUE, MAF: 0.1, missing rate: 0.1) Working space: 97 samples, 0 SNP using 1 (CPU) core sliding window: 500,000 basepairs, Inf SNPs |LD| threshold: 0.2 method: composite 0 markers are selected in total. Determine phylogenetic tree based on SNP data with a VCF, a HapMap, a Simple SNP or a GDS file

Version: 20140701

Usage: snphylo.sh -v VCF_file [-p Maximum_PLCS (5)] [-c Minimum_depth_of_coverage (5)]|-H HapMap_file [-p Maximum_PNSS (5)]|-s Simple_SNP_file [-p Maximum_PNSS (5)]|-d GDS_file [-l LD_threshold (0.1)] [-m MAF_threshold (0.1)] [-M Missing_rate (0.1)] [-o Outgroup_sample_name] [-P Prefix_of_output_files (snphylo.output)] [-b [-B The_number_of_bootstrap_samples (100)]] [-a The_number_of_the_last_autosome (22)] [-r] [-A] [-h]

Options: -A: Perform multiple alignment by MUSCLE -b: Perform (non-parametric) bootstrap analysis and generate a tree -h: Show help and exit -r: Skip the step removing low quality data (-p and -c option are ignored).

Acronyms: PLCS: The percent of Low Coverage Sample PNSS: The percent of Sample which has no SNP information LD: Linkage Disequilibrium MAF: Minor Allele Frequency

Simple SNP File Format:

Chrom Pos SampleID1 SampleID2 SampleID3 ...

    1       1000    A       A       T       ...
    1       1002    G       C       G       ...
    ...
    2       2000    G       C       G       ...
    2       2002    A       A       T       ...
    ...

Error: The length of sequence is too long (> 50000 bp) to construct a tree! Please restart this script with different parameter values (-l, -m and/or -M).

I have tried all sorts of combination for parameters values for -l, -m and -M however, I kept getting the same error. Is there anything else I should be doing?

yuxiang-li commented 5 years ago

Hi, I got the same error as you did, did you solve the problem finally?

nedoluzhko commented 5 years ago

Also, got this problem. Any idea?

fedebian91 commented 5 years ago

Hi, I got the same problem. Is there anybody who solved it?

yingzhang28 commented 4 years ago

Hello, I also got the same problem. Who had solved it?

Yoko-Hira commented 4 years ago

Hello,

I was able to make it work by changing the parameter for -l and -M. I hope this helps!


From: yingzhang28 notifications@github.com Sent: Saturday, October 10, 2020 12:56 AM To: thlee/SNPhylo SNPhylo@noreply.github.com Cc: Yoko Eck yoko.eck@ucr.edu; Author author@noreply.github.com Subject: Re: [thlee/SNPhylo] Error: The length of sequence is too long (> 50000 bp) to construct a tree! (#23)

Hello, I also got the same problem. Who had solved it?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/thlee/SNPhylo/issues/23#issuecomment-706507448, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHZELEQRRJWZ3P5JUOXQ6ZLSKAHS5ANCNFSM4D6MQWDQ.

Yoko-Hira commented 4 years ago

Hello!

I was able to make it work by changing the parameter for -l and -M. For example, I had -l 0.2 -m 0.1 -M 0.1 and this did not work. so I changed it to -l 0.7 -m 0.0 -M 0.02 and this worked perfectly for me,

I hope this helps!

lly1214 commented 2 years ago

Does everyone have the best solution about this problem?

thanks

QianghuiZhu commented 5 months ago

Hi, I also got the same problem. I calculated the longest line in fasta, and modified the line 267 and 269, and solved this issues. image

But a new error about MUSCLE occured: line 281: 29688 killed "${MUSCLE}" -phyi -in "${prefix_output}.fasta" -out "${prefix_output}.phylip.txt" Does anybody also get this problem? Thanks!

thlee commented 5 months ago

Hello

There could be various possible causes. However, the content in the following link might be helpful:

https://github.com/thlee/SNPhylo/issues/52#issuecomment-1973127123