hiruna72 / squigualiser

Visualise and analyse nanopore (ONT) raw signals
https://hiruna72.github.io/squigualiser/
MIT License
100 stars 1 forks source link

Read_ID not found in fastq file #49

Closed denisbeslic closed 7 months ago

denisbeslic commented 7 months ago

Dear author,

I tried to run your tool and plot signal2read: https://github.com/hiruna72/squigualiser#option-1---f5c-resquiggle

When I run it without the -r argument I get the following error: Exception: Error: read_id 9b6cba7a-6849-450d-8e18-2c66982092aa is not found in ../fastq-Dorado-SUP/SAMPLE.pass.fq However if look at my fastq file I can find the specified read.

Only when I run with the argument '-r @9b6cba7a-6849-450d-8e18-2c66982092a' it seems to work. But only with this single read.

The fastq was generated by Dorado using the pod5 format. Did this caused the bug?

Thank you, Denis

hiruna72 commented 7 months ago

Hello @denisbeslic,

The last character a is missing when you specify with -r. This might be the case. Could you please upload the fastq file with this specific read and the paf or bam file created usign f5c resquiggle.

denisbeslic commented 7 months ago

Hello,

thanks for your help. The missing 'a' was just a typo in my comment. I uploaded the the fastq file with the paf file: SAMPLE.pass_01.zip

hiruna72 commented 7 months ago

Thanks @denisbeslic.

How big is your pod5 file? Can you please upload the pod5 file and blow5 file? Also please share all the commands you used from basecalling to plotting.

denisbeslic commented 7 months ago

The pod5 & slow5 file are quite large. However, I can repeat the analysis with a subset and send you the files if I experience the same bug.

I used the following commands:

conda activate wf-basecalling-env
nextflow run epi2me-labs/wf-basecalling -profile singularity --input pod5/ --dorado_ext pod5 --out_dir fastq-Dorado-SUP/ --basecaller_cfg dna_r10.4.1_e8.2_400bps_sup@v4.2.0 --fastq_only
# Transform to blow5
conda activate blue-crab-env
blue-crab p2s pod5_dir -d blow5/
./slow5tools merge blow5/ -o merged.blow5

FASTQ=fastq-Dorado-SUP/SAMPLE.pass.fq
SIGNAL_FILE=merged.blow5
ALIGNMENT=resquiggle.paf
f5c resquiggle -c ${FASTQ} ${SIGNAL_FILE} -o ${ALIGNMENT}
OUTPUT_DIR=output_dir
squigualiser plot -f ${FASTQ} -s ${SIGNAL_FILE} -a ${ALIGNMENT} -o ${OUTPUT_DIR} 
hasindu2008 commented 7 months ago

Could you extract the BLOW5 record for the corresponding read as follows and send it? slow5tools get merged.blow5 readid -o readid.blow5

denisbeslic commented 7 months ago

Yes 9b6cba7a-6849-450d-8e18-2c66982092aa.zip

hiruna72 commented 7 months ago

Okay, I could reproduce the bug. I am looking into it. Thanks!

hiruna72 commented 7 months ago

Hello @denisbeslic,

Thank you very much for finding this bug! It had to do with the comment part after the read id in the fastq record. I added a fix to the dev branch.

Could you please follow the instructions to build from source code (don't forget to git clone --branch dev)? I will make a new pip package tomorrow.

denisbeslic commented 7 months ago

Great, thank you. I will test it later

denisbeslic commented 7 months ago

Just tested it on the dev branch, works perfectly fine. Thank you!

hiruna72 commented 7 months ago

Hello @denisbeslic,

Thanks again for using squigualiser and finding this bug. I have created a new release.