nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
382 stars 183 forks source link

Possible bug for SRA data processing: "Number of $PAIR1_EXT files is different from $PAIR2_EXT [$r1files vs $r2files]..." #431

Open joreynajr opened 3 years ago

joreynajr commented 3 years ago

I managed to solve my issue but I wanted to bring this up especially because this might be a problem when people are trying to use SRA data which uses _1 and _2 to designate read1 and read2, respectively. In my case, I decided to set PAIR1_EXT = "_1" and PAIR2_EXT = "_2" within the HiC-Pro, however, if your samples happened to be named data_folder/sample_1/fastq_1.fastq.gz and data_folder/sample_1/fastq_2.fastq.gz then HiCPro thinks there are two R1 files and one R2 file and fails to run. There are simple workarounds on my end but I think this error is tricky and not easily Google-able so I wanted to bring this up. For reference, this issue is happening on lines 329-330 of the HiC-Pro main script (HiC-Pro/bin/HiC-Pro), pasting the code below for reference:

r1files=$(find -L $RAW_DIR -mindepth 2 -maxdepth 2 -name "*.fastq" -o -name "*.fastq.gz" -o -name "*.fq.gz" -o -name "*.fq" | grep "$PAIR1_EXT" | wc -l) #!

r2files=$(find -L $RAW_DIR -mindepth 2 -maxdepth 2 -name "*.fastq" -o -name "*.fastq.gz" -o -name "*.fq.gz" -o -name "*.fq" | grep "$PAIR2_EXT" | wc -l) #!

On lines: 331-332 you also get this error message "Number of $PAIR1_EXT files is different from $PAIR2_EXT [$r1files vs $r2files]. Please, note that the paired-end files are detected using the PAIR1_EXT/PAIR2_EXT parameters. Be sure that there is no conflict with files/dir names." which is somewhat helpful. Overall, I opened this issue in case others run into similar problems and this may help them out.

Joaquin

nservant commented 3 years ago

Thanks @joreynajr. Indeed, this is a common issue. Which HiC-Pro version did you use ? I thought I improved that in the last version ... Best

joreynajr commented 3 years ago

Hi @nservant, I am using the HiCPro singularity image which is using HiC-Pro 2.11.4 and maybe that is the problem? I wanted to avoid installing HiC-Pro but I can work with my workaround for now.

Thanks, Joaquin

nservant commented 3 years ago

This is a recent version. So even if the v3.0.0 has been released, I'm not sure it will solve the issue. I'll keep this issue open, to see if we can improve that in the future. Thanks