sanger-pathogens / seroba

k-mer based Pipeline to identify the Serotype from Illumina NGS reads
https://sanger-pathogens.github.io/seroba/
Other
19 stars 16 forks source link

Names for forwards and reverse reads does not match. Cannot continue #42

Open tseemann opened 5 years ago

tseemann commented 5 years ago

I am getting this error:

Names for forwards and reverse reads does not match. Cannot continue

My R1 file:

@NB551233:47:H5WGVAFXY:1:11101:24878:1062 1:N:0:CGAGGCTG+NCTCTAGG

My R2 file:

@NB551233:47:H5WGVAFXY:1:11101:24878:1062 2:N:0:CGAGGCTG+NCTCTAGG

CC: @aunderwo

tseemann commented 5 years ago

The code is

https://github.com/sanger-pathogens/seroba/blob/37a57375f59d471d75da4afd4f0e6518520f1b79/seroba/tasks/sero_run.py#L10-L12

Is this operating on the read FILENAMES or the READ IDs ?

Mine are NNNN-NNNNN_S33_R2_001.fastq.gz and R2

antunderwood commented 5 years ago

This operates on FILENAMES

Looking at the code as it is written, it has the assumption that read filenames are in the format

_{R1,R2,1,2}.fastq.gz `NNNN-NNNNN_S33_R1_001.fastq.gz` and `NNNN-NNNNN_S33_R2_001.fastq.gz` will be split to `NNNN-NNNNN_S33_R1` and `NNNN-NNNNN_S33_R2` which don't match A fudge solution would be to make softlinks that remove the _001 until the sanger-pathogens team have bandwidth to correct the code
tseemann commented 5 years ago

Thanks once again @aunderwo

tseemann commented 5 years ago

Argh. Our other system is to use SAMPLE/{R1,R2}.fq.gz but that fails too. Names for forwards and reverse reads does not match. Cannot continue :(

cimendes commented 5 years ago

I was implementing a seroba component on flowcraft and getting this error when some particular components came before seroba. Thank you @tseemann and @aunderwo for your discussion! Without it I could not have figured it out! And @tseemann, I'm renaming all the input files to ${sampleid}{1,2}.fq.gz to make it work. :)

Thanks!!!

tseemann commented 5 years ago

I don't feel I should have to rename my files for a tool to work :(

sreerampeela commented 2 years ago

Even I have the same issue. I am using Docker image and renamed files as read_1.fq.gz and read_2.fq.gz as suggested by @cimendes. The issue isn't resolved. Any suggestions on how to resolve this?

arif-tanmoy commented 1 year ago

renaming the files to $name_1.fastq.gz and $name_2.fastq.gz worked.

fgonzalez3 commented 2 months ago

I tried this, but still no luck. Has anybody come across another workaround?

renaming the files to $name_1.fastq.gz and $name_2.fastq.gz worked.

arif-tanmoy commented 1 month ago

I tried this, but still no luck. Has anybody come across another workaround?

renaming the files to $name_1.fastq.gz and $name_2.fastq.gz worked.

Hey @fgonzalez3 - I haven't used it in a while. I think I actually made changes in the Python code of Seroba to recognize the filenames we use. I will try to find it and share it here. Also, you can always try using other tools.