thlee / SNPhylo

A pipeline to generate a phylogenetic tree from huge SNP data
http://chibba.pgml.uga.edu/snphylo/
GNU General Public License v2.0
83 stars 37 forks source link

Species name are too long; sequence is too short after default SNPs pruning #49

Open hungweichen0327 opened 2 years ago

hungweichen0327 commented 2 years ago

Dear @thlee and community,

This software is convenient and useful for generating a ML tree with SNP data. After I got the final result from SNPhylo. I found two problems.

  1. The maximum length of the accession name seems to be 10 letters. But I have some accessions with names over 10 letters. Thus, I have duplicated names from 7 accessions since the first 10 letters of the names are the same. Is that possible to know which position of each accession is in the tree file (snphylo.output.ml.tree)?

  2. The original SNP number of the dataset is 33,332,057 SNPs. And the log file of SNPhylo said there are too many SNPs when I ran it without filtering the SNPs ( using "-r" option). Therefore, I reran it with the default option. However, I found that the log file of the SNPhylo mentioned: 1,824 markers are selected in total. Warning: The length of sequence is too short (< 2000 bp) to construct a good tree! Please consider to restart this script with different parameter values (-l, -m and/or -M).

Do you have recommended parameter values for my case? The SNPhylo log file using the default option as attached. job_SNPhylo_log.txt

Thank you for the help!