nf-core / hic

Analysis of Chromosome Conformation Capture data (Hi-C)
https://nf-co.re/hic
MIT License
90 stars 54 forks source link

Certain sample names produce a missing header error on INPUT_CHECK:SAMPLESHEET_CHECK step #163

Closed cnluzon closed 1 year ago

cnluzon commented 1 year ago

Description of the bug

Hi! First of all thanks for developing and maintaining the nf-core/hic pipeline!

I have got a very strange issue that has driven me crazy for a while. Maybe there is something very obvious I am not seeing, so if that is the case, please disregard all of this 😅

After a lot of trial and error I have isolated how to reproduce it, but I am still a bit puzzled as to why exactly it happens. It seems to have something to do with similar group names, which I see is not necessarily ideal, but there are many circumstances I can think of where this would happen, so that is why I thought it would be useful to report. If there is a reason why not to allow names like this in the downstream process, then I would hope for a more informative error message.

So my minimal reproducible design table example (design_error.csv in the attached zip) looks like this:

sample,fastq_1,fastq_2
group_1_mES,./fastq/group_1_1.fastq.gz,./fastq/group_1_2.fastq.gz
group_10_mES,./fastq/group_10_1.fastq.gz,./fastq/group_10_2.fastq.gz

And in my ./fastq directory I have in fact those files:

➜ ls fastq/ -1
group_10_1.fastq.gz
group_10_2.fastq.gz
group_1_1.fastq.gz
group_1_2.fastq.gz

I run the pipeline and I get an error in the NFCORE_HIC:HIC:INPUT_CHECK:SAMPLESHEET_CHECK process that reads:

[CRITICAL] The given sample sheet does not appear to contain a header. 

Now the interesting part is that if I change ever so slightly the naming of the groups (design_success.csv in the attached zip):

sample,fastq_1,fastq_2
group_01_mES,./fastq/group_1_1.fastq.gz,./fastq/group_1_2.fastq.gz
group_10_mES,./fastq/group_10_1.fastq.gz,./fastq/group_10_2.fastq.gz

Note that I only added a leading zero in the first line, so sample name group_1_mES now is group_01_mES.

Success! It is running:

executor >  local (3)
[55/5cb224] process > NFCORE_HIC:HIC:INPUT_CHECK:SAMPLESHEET_CHECK (design.csv)  [100%] 1 of 1 ✔

I have reproduced this locally with docker on my computer, but I got the same exact error on uppmax with -profile uppmax option

Command used and terminal output

nextflow run nf-core/hic -profile docker --outdir ./mydata -r 2.0.0 --input design_error.csv --digestion mboi --genome mm10

ERROR ~ Error executing process > 'NFCORE_HIC:HIC:INPUT_CHECK:SAMPLESHEET_CHECK (design.csv)'                                                                                                                                                                 

Caused by:                                                                                                                                                                                                                                                    
  Process `NFCORE_HIC:HIC:INPUT_CHECK:SAMPLESHEET_CHECK (design.csv)` terminated with an error exit status (1)                                                                                                                                                

Command executed:                                                                                                              

  check_samplesheet.py \                                                                                                       
      design.csv \                                                                                                             
      samplesheet.valid.csv                                                                                                    

  cat <<-END_VERSIONS > versions.yml                                                                                           
  "NFCORE_HIC:HIC:INPUT_CHECK:SAMPLESHEET_CHECK":                                                                              
      python: $(python --version | sed 's/Python //g')                                                                         
  END_VERSIONS                                                                                                                 

Command exit status:                                                                                                           
  1                                                                                                                            

Command output:                                                                                                                
  (empty)                                                                                                                      

Command error:                                                                                                                 
  [CRITICAL] The given sample sheet does not appear to contain a header.          

Work dir:                                                                                                                      
  /home/carmen/work/experiments/230509_reproduce_error/work/b7/09d1541d09cf52115ea54ccb0ff0f2

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

Relevant files

mre_data.zip nextflow.log

System information

Nextflow version 23.04.1 Desktop Dell Precision 5820 Tower
Executor: local Container engine: Docker OS: Ubuntu 22.04.2 LTS (but also HPC - Uppmax + Singularity) nf-core/hic version 2.0.0

cnluzon commented 1 year ago

Sorry, I just realised this seems to be the same issue as #152 , feel free to close it if it is redundant.

nservant commented 1 year ago

yes, this is related to the nf-core template and has been fixed recently