nf-core / rnavar

gatk4 RNA variant calling pipeline
https://nf-co.re/rnavar
MIT License
34 stars 31 forks source link

NFCORE_RNAVAR:RNAVAR:GATK4_MERGEVCFS fails with error "has sample entries that don't match the other files" #128

Closed nschcolnicov closed 7 months ago

nschcolnicov commented 7 months ago

Description of the bug

NFCORE_RNAVAR:RNAVAR:GATK4_MERGEVCFS fails with error "has sample entries that don't match the other files"

This is something that happens when running with --annotate_tools merge, and it also happens with --annotate_tools VEP and the issue is only spotted when running input samplesheets with more than one sample.

For the first command the full error said:

java.lang.IllegalArgumentException: Input path K562_REP2_13scattered_5scattered.vcf.gz has sample entries that don't match the other files.

And when looking at the file, I saw that the K562_REP2_13scattered_5scattered.vcf.gz VCF file contained an incorrect sample:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  GM12878_REP1

While it should look like the other files, for example K562_REP2_10scattered.vcf:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  K562_REP2

It seems that the files from different samples are getting mixed up somewhere

Command used and terminal output

First command:
nextflow run nf-core/rnavar -r dev -latest  -profile bi,cluster --outdir . --annotate_tools merge --input https://raw.githubusercontent.com/nf-core/test-datasets/rnavar/samplesheet/v1.0/samplesheet_full.csv -c ../config.config -resume

Second command:
nextflow run nf-core/rnavar -r dev -latest -profile bi,cluster --input ../input_backup.csv --outdir . --annotate_tools merge -resume -c ../config.config

Relevant files

The samplesheet contains these samples:

sample,fastq_1,fastq_2,strandedness
GM12878_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/SRX1603629_T1_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/SRX1603629_T1_2.fastq.gz,reverse
GM12878_REP2,s3://nf-core-awsmegatests/rnaseq/input_data/SRX1603630_T1_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/SRX1603630_T1_2.fastq.gz,reverse
K562_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/SRX1603392_T1_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/SRX1603392_T1_2.fastq.gz,reverse
K562_REP2,s3://nf-core-awsmegatests/rnaseq/input_data/SRX1603393_T1_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/SRX1603393_T1_2.fastq.gz,reverse
MCF7_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/SRX2370490_T1_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/SRX2370490_T1_2.fastq.gz,reverse
MCF7_REP2,s3://nf-core-awsmegatests/rnaseq/input_data/SRX2370491_T1_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/SRX2370491_T1_2.fastq.gz,reverse
H1_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/SRX2370468_T1_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/SRX2370468_T1_2.fastq.gz,reverse
H1_REP2,s3://nf-core-awsmegatests/rnaseq/input_data/SRX2370469_T1_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/SRX2370469_T1_2.fastq.gz,re

The config file contains reference file information, and the input_backup.csv contains a list of samples, similar to the samplesheet_full.csv

System information

nf-core/rnavar -r dev

maxulysse commented 7 months ago

Oh, good catch, I'll put this on top of the pile

nschcolnicov commented 7 months ago

PR was merged, closing this issue