tseemann / snippy

:scissors: :zap: Rapid haploid variant calling and core genome alignment
GNU General Public License v2.0
469 stars 115 forks source link

Core SNP phylogeny #389

Closed hjafar closed 4 years ago

hjafar commented 4 years ago

I have run the following command and got an issue. How can I solve it?

$ '/home/genomic-lab/anaconda3/envs/pyppi/bin/run_gubbins.py' -p gubbins clean.full.aln

--- Gubbins 2.4.1 ---

Croucher N. J., Page A. J., Connor T. R., Delaney A. J., Keane J. A., Bentley S. D., Parkhill J., Harris S.R. "Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins". Nucleic Acids Res. 2015 Feb 18;43(3):e15. doi: 10.1093/nar/gku1196.

Checking dependencies... ...done. Run time: 0.02 s

Checking input files... Error with input FASTA file: you need more than 3 sequences to build a meaningful tree Each sequence must have a name and some genomic data There input alignment file does not exist or has an invalid format

Best, Hussain

tseemann commented 4 years ago

This is the repo for snippy. That looks like a gubbins problem? Please file an issue at the gubbins website.

hjafar commented 4 years ago

Hi Seemann,

Thank you for your writing. The gubbins is working well. Please see the manual below, what do think the solution for the issue. Also, I want to report/present all SNPs and get the phylogeny M.tb using snippy. I have resulted from SNPs for two data but I do not know to present them. Kindly give any suggestion.

$ '/home/genomic-lab/anaconda3/envs/pyppi/bin/run_gu bbins.py' -h usage: run_gubbins.py [-h] [--outgroup OUTGROUP] [--starting_tree STARTING_TREE] [--use_time_stamp] [--verbose] [--no_cleanup] [--tree_builder {raxml,fasttree,hybrid}] [--iterations ITERATIONS] [--min_snps MIN_SNPS] [--filter_percentage FILTER_PERCENTAGE] [--prefix PREFIX] [--threads THREADS] [--converge_method {weighted_robinson_foulds,robinson_foulds,recombination}] [--version] [--min_window_size MIN_WINDOW_SIZE] [--max_window_size MAX_WINDOW_SIZE] [--raxml_model {GTRGAMMA,GTRCAT}] [--remove_identical_sequences] alignment_filename

Croucher N. J., Page A. J., Connor T. R., Delaney A. J., Keane J. A., Bentley S. D., Parkhill J., Harris S.R. "Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins". Nucleic Acids Res. 2015 Feb 18;43(3):e15. doi: 10.1093/nar/gku1196.

positional arguments: alignment_filename Multifasta alignment file

optional arguments: -h, --help show this help message and exit --outgroup OUTGROUP, -o OUTGROUP Outgroup name for rerooting. A list of comma separated names can be used if they form a clade (default: None) --starting_tree STARTING_TREE, -s STARTING_TREE Starting tree (default: None) --use_time_stamp, -u Use a time stamp in file names (default: False) --verbose, -v Turn on debugging (default: False) --no_cleanup, -n Dont cleanup intermediate files (default: False) --tree_builder {raxml,fasttree,hybrid}, -t {raxml,fasttree,hybrid} Application to use for tree building (default: raxml) --iterations ITERATIONS, -i ITERATIONS Maximum No. of iterations (default: 5) --min_snps MIN_SNPS, -m MIN_SNPS Min SNPs to identify a recombination block (default: 3) --filter_percentage FILTER_PERCENTAGE, -f FILTER_PERCENTAGE Filter out taxa with more than this percentage of gaps (default: 25) --prefix PREFIX, -p PREFIX Add a prefix to the final output filenames (default: None) --threads THREADS, -c THREADS Number of threads to run with RAXML, but only if a PTHREADS version is available (default: 1) --converge_method {weighted_robinson_foulds,robinson_foulds,recombination}, -z {weighted_robinson_foulds,robinson_foulds,recombination} Criteria to use to know when to halt iterations (default: weighted_robinson_foulds) --version show program's version number and exit --min_window_size MIN_WINDOW_SIZE, -a MIN_WINDOW_SIZE Minimum window size (default: 100) --max_window_size MAX_WINDOW_SIZE, -b MAX_WINDOW_SIZE Maximum window size (default: 10000) --raxml_model {GTRGAMMA,GTRCAT}, -r {GTRGAMMA,GTRCAT} RAxML model (default: GTRCAT) --remove_identical_sequences, -d Remove identical sequences (default: False)

Best, Hussain

tseemann commented 4 years ago

you can not build a tree from 2 samples. you need at least 3.

tseemann commented 4 years ago

@hjafar maybe try this web site: https://gph.niid.go.jp/tgs-tb/

hjafar commented 4 years ago

Dear Seemann,

How to present results (visualize) of snippy for example SNPs?

hjafar commented 4 years ago

my snippy SNP.txt is:

Software snippy 4.4.0 Variant-COMPLEX 2076 Variant-DEL 203 Variant-INS 164 Variant-MNP 102 Variant-SNP 17188 VariantTotal 19733

tseemann commented 4 years ago

Mycobacterium tuberculosis should not have 20,000 SNPs. Maybe you did not use the correct reference genome. Snippy 4.4.0 is > 1 year old. Please use 4.6.0

hjafar commented 4 years ago

Dear Seemann, The below results of another patient using Snippy 4.4.0, it is only 8 SNPs. Is this correct output? How to present results (visualize) or what is next?

ReferenceSize 6269850 Software snippy 4.4.0 Variant-SNP 8 VariantTotal 8

tseemann commented 4 years ago

Snippy is currently version 4.6.0, please use that. 8 SNPs sounds good for Mtb? The results are in the VCF file?

hjafar commented 4 years ago

Yes, I have results of .VCF file. what is next?

I have run the following command to update snippy to current version 4.6.0. But it is still old version snippy 4.4.0 when I have checked it with snippy --version. conda update snippy Is there a command line for updating?