harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

issues during the usage #125

Closed weirdo-onlooker closed 9 months ago

weirdo-onlooker commented 9 months ago

I have constructed two CSV files, which are processed with two different sets of resequencing data against the same REF.

However, when I run the second CSV file, it tells me "Nothing to be done (all requested files are present and up to date)."

I suspect there may be an issue with my setup. Here's how I run it: The two .csv files are named PNG1.csv and PNG2.csv, with the following contents:

PNG1.csv:

BioSample,refGenome,refPath,Run,fq1,fq2
pangolin,GCF_030020395.1,/sdc1/home/hk/pangolin/dataset/Ref_genome/ncbi_dataset/data/GCF_030020395.1/GCF_030020395.1_mManPen7.hap1_genomic.fna,PNG1,/sdc1/home/hk/pangolin/dataset/second/PNG1_1.clean.fq.gz,/sdc1/home/hk/pangolin/dataset/second/PNG1_2.clean.fq.gz

PNG2.csv:

BioSample,refGenome,refPath,Run,fq1,fq2
pangolin,GCF_030020395.1,/sdc1/home/hk/pangolin/dataset/Ref_genome/ncbi_dataset/data/GCF_030020395.1/GCF_030020395.1_mManPen7.hap1_genomic.fna,PNG2,/sdc1/home/hk/pangolin/dataset/second/PNG2_1.clean.fq.gz,/sdc1/home/hk/pangolin/dataset/second/PNG2_2.clean.fq.gz

Before running each CSV, I change the "samples" in workflow/modules/xx/config/config.yaml to the specified .csv file. Is there an issue with my execution process?

tsackton commented 9 months ago

snpArcher is designed to genotype one or more individuals, represented by the BioSample column in the sample sheet. In this case, both PNG1.csv and PNG2.csv have the same sample id (BioSample), namely pangolin. When the first run completes, you should have a vcf file that has one sample in it, named pangolin. When you submit the second file (PNG2.csv), Snakemake realizes that you have already completed genotyping for the "pangolin" sample, and thus there is nothing left to do.

If you have multiple individuals you want to genotype, you should include them as separate rows in the sample csv, with different BioSample IDs.