nf-cmgg / structural

A bioinformatics best-practice analysis pipeline for calling structural variants (SVs), copy number variants (CNVs) and repeat region expansions (RREs) from short DNA reads
https://nf-cmgg.github.io/structural/
MIT License
18 stars 3 forks source link

Handling duplicate rows in samplesheet with warnings #67

Closed mvheetve closed 9 months ago

mvheetve commented 9 months ago

Warnings for duplicate sample sheet rows

When the samplesheet contains duplicate rows, it now says:

Samplesheet errors:
    The samplesheet contains duplicate rows for entry 5 and entry 17 ([sample:sample1, cram:s3://cmgg-results/WGS/cram/.../sample1/sample1.cram, crai:s3://cmgg-results/WGS/cram/.../sample1/sample1cram.crai])

Maybe it would be better if you issue a warning and proceed with the pipeline, ignoring any duplicated instances.

Just a thought Mattias

nvnieuwk commented 9 months ago

Hi, this is fully intended behaviour. You don't want to process the exact same files more than once. Files coming from separate sources will not trigger this error, because their file path should be different. You can just remove one of the duplicate rows and it will run fine.