thlee / SNPhylo

A pipeline to generate a phylogenetic tree from huge SNP data
http://chibba.pgml.uga.edu/snphylo/
GNU General Public License v2.0
83 stars 37 forks source link

Error: There are too small number of SNP data in the file #34

Open fedebian91 opened 5 years ago

fedebian91 commented 5 years ago

Hi, I am running SNPHylo as such:

$ ./snphylo.sh -v vcf_file

Upon the execution, I have this warnings and message error as following:

"Start to remove low quality data.

Warning: There were 562577 unreadable chromosome ids. Identifier for a chromosome should be a number.

562577 low quality lines were removed

Determine phylogenetic tree based on SNP data with a VCF, a HapMap, a Simple SNP or a GDS file

Version: 20180901

Usage: snphylo.sh -v VCF_file [-p Maximum_PLCS (5)] [-c Minimum_depth_of_coverage (5)]|-H HapMap_file [-p Maximum_PNSS (5)]|-s Simple_SNP_file [-p Maximum_PNSS (5)]|-d GDS_file [-l LD_threshold (0.1)] [-m MAF_threshold (0.1)] [-M Missing_rate (0.1)] [-o Outgroup_sample_name] [-P Prefix_of_output_files (snphylo.output)] [-b [-B The_number_of_bootstrap_samples (100)]] [-a The_number_of_the_last_autosome (22)] [-t The_number_of_cores_used (1)] [-r] [-A] [-h]

Options: -A: Perform multiple alignment by MUSCLE -b: Perform (non-parametric) bootstrap analysis and generate a tree -h: Show help and exit -r: Skip the step removing low quality data (-p and -c option are ignored).

Acronyms: PLCS: The percent of Low Coverage Sample PNSS: The percent of Sample which has no SNP information LD: Linkage Disequilibrium MAF: Minor Allele Frequency

Simple SNP File Format:

Chrom Pos SampleID1 SampleID2 SampleID3 ...

1   1000    A   A   T   ...
1   1002    G   C   G   ...
...
2   2000    G   C   G   ...
2   2002    A   A   T   ...
...

Error: There are too small number of SNP data in the file (snphylo.output.filtered.hapmap)! Please restart this script with different parameter values (-p)".

My vcf_file was generated by GATK. I tried to convert the vcf_file in hapmap_file with the software "Tassel" and I executed again SNPhylo with -H option. However the program gives the same message error.

Could you provide suggestions on how to fix his error? Or could you send me a vcf file example to compare with mines?

Thanks in advance for your support!

zuzmus commented 5 years ago

I got the same, in your case you need to simply rename your chromosome(s) to be just a number, not letters... (the program is a bit not-so-smart in this...). You might still face more issues later:-)) Good luck!

fedebian91 commented 5 years ago

Thank you very much for your reply, I will try to fix as you suggested!

Il giorno sab 9 nov 2019 alle ore 22:13 zuzmus notifications@github.com ha scritto:

I got the same, in your case you need to simply rename your chromosome(s) to be just a number, not letters... (the program is a bit not-so-smart in this...). You might still face more issues later:-)) Good luck!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/thlee/SNPhylo/issues/34?email_source=notifications&email_token=ANUAO3OPCTLSRVEKBON2UN3QS4RYXA5CNFSM4JGNCH72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDUPMQI#issuecomment-552138305, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANUAO3JULSU3R7R7RYCKZVTQS4RYXANCNFSM4JGNCH7Q .

fedebian91 commented 5 years ago

Dear Sir, Thank you for your help. I renamed my chromosomes as you suggested and now that part of the script works well. However another issue appeared later:

Error: The length of sequence is too long (> 50000 bp) to construct a tree!Please restart this script with different parameter values (-l, -m and/or -M).

Could you please help for this?

Thank you very much in advance!

Best regards,

Federico

Il giorno sab 9 nov 2019 alle ore 22:13 zuzmus notifications@github.com ha scritto:

I got the same, in your case you need to simply rename your chromosome(s) to be just a number, not letters... (the program is a bit not-so-smart in this...). You might still face more issues later:-)) Good luck!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/thlee/SNPhylo/issues/34?email_source=notifications&email_token=ANUAO3OPCTLSRVEKBON2UN3QS4RYXA5CNFSM4JGNCH72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDUPMQI#issuecomment-552138305, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANUAO3JULSU3R7R7RYCKZVTQS4RYXANCNFSM4JGNCH7Q .