ukhsa-collaboration / snapperdb

GNU General Public License v3.0
21 stars 5 forks source link

Reproducing "Tutorial for setting up SnapperDB Instance" : Not getting the sames SNP Address #22

Closed ScaonE closed 5 years ago

ScaonE commented 5 years ago

Tutorial cmd & output : run_snapperdb.py get_strains -c ebg4_config.txt;

AM933172 1.1.1.1.1.1.1 SRR5194193_1 1.1.1.1.1.1.9 SRR5055288_1 1.1.2.2.2.2.2 SRR5815674_1 1.1.3.3.3.3.3 SRR5864444_1 1.1.3.5.5.5.5 SRR5850014_1 1.1.3.5.5.8.8 SRR6131972_1 1.1.4.4.4.4.4 ERR2200244_1 1.1.7.7.7.7.7 SRR5583186_1 2.2.6.6.6.6.6

My cmd & output : $snapperdb get_strains -c custom_salmo.txt;

AM933172 1.1.1.1.1.1.1 SRR5194193_1 1.1.1.1.1.1.9 ERR2200244_1 1.1.2.2.2.2.2 SRR5055288_1 1.1.3.3.3.3.3 SRR5815674_1 1.1.4.4.4.4.4 SRR5864444_1 1.1.4.6.6.6.6 SRR5850014_1 1.1.4.6.6.8.8 SRR6131972_1 1.1.5.5.5.5.5 SRR5583186_1 2.2.7.7.7.7.7

ERR2200244_1 seems to have an unexpected SNP address

May I ask which version of each external tool you recommand ? (biopython, psycopg2, paramiko, hashids, joblib, Postgres, PHEnix, Samtools, Picard, GATK, BWA)

Another question : Does fastq_to_db accept single-end FASTQ as input ?

timdallman commented 5 years ago
  1. SNP Address: SNP addresses are order dependent so the cluster number assigned at each threshold is dependent on the order the isolates were added to SnapperDB.
  2. Fastq_to_db does not except single-end FASTQ - If you produce a VCF independetly from SnapperDB based in single end reads you can ingest them into SnapperDB with vcf_to_db