Open bioinfo11 opened 6 years ago
When you pass in just one sample, and use the joint workflow with CoordinatesOnly, you are asking it to find bubbles in the graph of just one sample. So by definition all you can spot are paralogs, or heterozygous sites.
If you want to catch times when the sample differs from the reference, use
--ref CoordinatesAndInCalling
Check the figure here: https://www.ncbi.nlm.nih.gov/pubmed/23172865
Am I making sense? Have I missed something?
Hi Zam,
I have three bacterial samples that I want to use to look for SNPs against an assembled reference. To do this, I have tried two methods using the run_calls wrapper. The first was to run each sample against the reference individually. The second, was to run all three samples together against the reference.
The command I used was:
/path/to/CORTEX_release_v1.0.5.17/scripts/calling/run_calls.pl --first_kmer 31 --fastaq_index /path/to/index --auto_cleaning yes --bc yes --pd no --outdir output --outvcf outputvcf --ploidy 1 --stampy_hash stampy_refhash --stampy_bin /path/to/stampy-1.0.31/stampy.py --list_ref_fasta list_ref_fasta --refbindir /path/to/working_directory --genome_size 2800000 --qthresh 5 --mem_height 17 --mem_width 1000 --vcftools_dir /path/to/vcftools_0.1.11 --do_union yes --ref CoordinatesOnly --workflow joint --logfile logfile_output log_1.txt
When I compared the two methods using the 'wk_flow_J_RefCO_FINALcombined_BC_calls_at_all_k.decomp.vcf' I was expecting to see the same SNPs/positions called for each sample, regardless of the method used, however this was not the case. The only difference between the two methods was the index files used:
For running samples individually (e.g. for sample 1):
For running samples together:
Am I doing something wrong?
Thanks