Closed llaomer closed 4 weeks ago
Hey, could you convert the pod5 files to blow5 using blue-crab [https://github.com/Psy-Fer/blue-crab] and see if you still getting the same error?
Also Skipped 380450 reads with MAPQ
is concerning - how did you basecall the reads (raw.fastq.gz) and map the reads (align_sorted.bam)?
Sorry,my data is RNA data for R10.4, which is not supported by Blow5.
raw.fastq.gz(I merged the fastq-pass data generated by MinION) cat fastq_pass/*.fastq.gz > raw.fastq.gz
This is my command (align_sorted.bam): f5c index -d fast5 raw.fastq.gz minimap2 -ax map-ont -t 8 mpox.fasta raw.fastq.gz | samtools sort -o align_sorted.bam -T reads.tmp samtools index align_sorted.bam
BLOW5 supports any chemistry/flowcell combination.
Commands look allright, but could be something that happened during basecalling. If you can get the header metadata in the signals that could give us some clues.
If you have the data converted to blow5 format, you can use slow5tools skim --hdr reads.blow5
command to grab the header metadata.
Hi,This is the result of the command:slow5tools skim --hdr reads.blow5 (f5c) zzr@zzr-VirtualBox:~/shares$ cat blow5/*.blow5 > reads.blow5 (f5c) zzr@zzr-VirtualBox:~/shares$ slow5tools skim --hdr reads.blow5
@acquisition_id 5ead6516746fd7b2d57b13c49d6d13312cc018bc @acquisition_start_time 2024-07-01 22:06:32.425000+00:00 @adc_max 4095 @adc_min -4096 @asic_id 5097 @asic_id_eeprom 9473893 @asic_temp 546.896545 @asic_version IA02D @barcoding_enabled 0 @basecall_config_filename rna_rp4_130bps_hac_mk1c.cfg @configuration_version 5.9.18 @data_source real_device @device_id MC-110967 @device_type minion_mk1c @distribution_status stable @distribution_version 24.02.16 @exp_script_name sequencing/sequencing_MIN004RA_RNA:FLO-MIN004RA:SQK-RNA004:130 @exp_script_purpose sequencing_run @exp_start_time 2024-07-02T06:06:32.425856+08:00 @experiment_name 24-7-1 @experiment_type rna @flow_cell_id FAZ23462 @flow_cell_product_code FLO-MIN004RA @guppy_version 7.3.11+0112dde09 @heatsink_temp 33.988281 @host_product_code MIN-101C @host_product_serial_number MC-110967 @hostname MC-110967 @installation_type nc @is_simulated 0 @local_basecalling 1 @operating_system ubuntu 18.04 @package bream4 @package_version 7.9.8 @protocol_group_id 24-7-1 @protocol_name sequencing/sequencing_MIN004RA_RNA:FLO-MIN004RA:SQK-RNA004:130 @protocol_run_id aadb8975-9f13-48da-ac6d-3fe03446b6a4 @protocol_start_time 2024-07-02T06:05:52.213066+08:00 @protocols_version 7.9.8 @run_id 5ead6516746fd7b2d57b13c49d6d13312cc018bc @sample_frequency 4000 @sample_id no_sample @sample_rate 4000 @selected_speed_bases_per_second 130 @sequencer_position MC-110967 @sequencer_position_type MinION Mk1C @sequencing_kit sqk-rna004 @software MinKNOW 24.02.16 (Bream 7.9.8, Core 5.9.12, Dorado 7.3.11+0112dde09) @system_name MC-110967 @system_type MinION Mk1C @usb_config fx3_0.0.0#fpga_2.4.3#ctrl#USB0 @version 5.9.12
[main] cmd: slow5tools skim --hdr reads.blow5 [main] real time = 0.008 sec | CPU time = 0.022 sec | peak RAM = 0.007 GB
In addition, I will go through the process again using blow5. However, the following issues occurred during the indexing step (f5c) zzr@zzr-VirtualBox:~/shares$ f5c index --slow5 reads.blow5 raw.fastq.gz [slow5_idx_build::ERROR] Failed to allocate memory: Cannot allocate memory At src/slow5_idx.c:280 [sig_handler::ERROR] I regret to inform that a segmentation fault occurred. But at least it is better than a wrong answer.
Looking at the header, the basecalling metadata looks all right. Is this mpox.fasta
the reference genome or the transcriptome?
About the blow5 error - Can I know the command you used to do the conversion and can you send the full log messages? Also how much RAM does your virtual-box VM has got? I am trying to see if the conversion ended abruptly due to out of memory.
Can you check if the slow5 file is intact by running and pasting the output of the following:
slow5tools quickcheck reads.blow5
slow5tools stats reads.blow5
slow5tools index reads.blow5
Yes, mpox.fasta is the reference genome.
The command of conversion: blue-crab p2s pod5_dir -d blow5_dir (f5c) zzr@zzr-VirtualBox:~/shares$ blue-crab p2s pod5_dir -d blow5_dir 17-Oct-24 15:56:09 - blue-crab - [INFO]: Creating directory: blow5_dir 17-Oct-24 15:56:09 - blue-crab - [INFO]: many2many: 73 pod5 files detected as input. Writing 1:1 pod5->s/blow5 to dir: blow5_dir 17-Oct-24 15:56:09 - blue-crab - [INFO]: writing s/blow5 to dir: blow5_dir 17-Oct-24 16:13:24 - blue-crab - [INFO]: pod5 -> s/blow5 complete
Virtual memory:64GB Actual memory allocation:62.18GB
(f5c) zzr@zzr-VirtualBox:~/shares$ slow5tools quickcheck reads.blow5 [main] cmd: slow5tools quickcheck reads.blow5 [main] real time = 0.004 sec | CPU time = 0.014 sec | peak RAM = 0.007 GB
(f5c) zzr@zzr-VirtualBox:~/shares$ slow5tools stats reads.blow5 file path reads.blow5 file version 0.2.0 file format BLOW5 record compression method zlib signal compression method svb-zd number of read groups 1 number of auxiliary fields 13 auxiliary fields channel_number,median_before,read_number,start_mux,start_time,end_reason,tracked_scaling_shift,tracked_scaling_scale,predicted_scaling_shift,predicted_scaling_scale,num_reads_since_mux_change,time_since_mux_change,num_minknow_events [stats_main] counting number of slow5 records... [slow5_get_next_mem::ERROR] Failed to allocate memory: Cannot allocate memory At src/slow5.c:3265 [stats_main::ERROR] Error reading the file.
(f5c) zzr@zzr-VirtualBox:~/shares$ slow5tools index reads.blow5 [slow5_idx_build::ERROR] Failed to allocate memory: Cannot allocate memory At src/slow5_idx.c:280 [segv_handler::ERROR] I regret to inform that a segmentation fault occurred. But at least it is better than a wrong answer. [src/main.c(segv_handler:105)::DEBUG] Here is the backtrace: /lib/x86_64-linux-gnu/libc.so.6(+0x1a0777)[0x7fbab07a0777] /lib/x86_64-linux-gnu/libc.so.6(+0x8b3e6)[0x7fbab068b3e6] /lib/x86_64-linux-gnu/libc.so.6(fread+0x79)[0x7fbab067fba9] slow5tools(+0x4be02)[0x559259966e02] slow5tools(slow5_idx_to+0x1e)[0x5592599684ee] slow5tools(slow5_idx_create+0x31)[0x55925995cdb1] slow5tools(_Z10index_mainiPPcP12program_meta+0x331)[0x5592599305d1] slow5tools(main+0x89d)[0x55925992725d] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fbab0629d90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fbab0629e40] slow5tools(+0xc4c5)[0x5592599274c5]
Thanks, how did you generate this reads.blow5
from the blow5_dir
- did you use slow5tools merge? Could you give me the command that you ran to merge and what log it printed?
Thank you very much!Based on your suggestion, I merged the files using slow5tools. Everything went smoothly. But I have a question. This command_(f5c eventalign -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] --slow5 [slow5_file] > [events.tsv])_is for S/BLOW5 (R9.4 DNA/RNA or R10.4 DNA data), is it valid for R10.4 RNA data?
Glad to hear that it works with the merge
. How did you create the reads.blow5 before, I am curious to know what caused that corrupted blow5 before.
Yeh, the command you mentioned is valid for all those cases. When using BLOW5 input, f5c will autodetect the model based on the metadata in the BLOW5 (as long as the POD5 has correct metadata). You can double-check by looking at the log messages from f5c:
Detected RNA data. --rna was set automatically.
It should not print this for DNA data.Detected R10 data. --pore r10 was set automatically.
sqk-rna004
as the kit and FLO-MIN004RA
as the flow_cell_product_code, so they are RP4/RNA004 (not R10.4). For RP4/RNA004 you should see the message Detected RNA004 data. --pore rna004 was set automatically.
Do you still get this all 6763 out of 6763 reads failed. Check the input files.
when you use the S/BLOW5 as input? If so, highly likely that the BAM file / FASTQ file does not match the signal data, which I can tell you how to investigate.
I previously used this command (*cat blow5_dir/.blow5 > reads.blow5**) to generate reads.blow5. This time I used this command (slow5tools merge blow5_dir -o merged.blow5).
Yes, they are RP4/RNA004. Thank you very much for your detailed explanation.
When I used S/BLOW5 as input, the previous error did not occur. (f5c) zzr@zzr-VirtualBox:~/shares/m6anet/m6a$ f5c eventalign -b align_sorted.bam -g mpox.fasta -r raw.fastq.gz --slow5 merged.blow5 > eventalign.tsv [init_core::INFO] builtin RNA004 nucleotide model loaded [meth_main] skipped unmapped: 0, skipped secondary: 0, skipped low_mapq: 380450 [meth_main] total entries: 6763, qc fail: 1, could not calibrate: 750, no alignment: 63, bad reads: 0 [meth_main] total bases: 3.2 Mbases [meth_main] Data loading time: 47.400 sec [meth_main] - bam load time: 7.262 sec [meth_main] - fasta load time: 36.137 sec [meth_main] - slow5 load time: 3.959 sec [meth_main] Data processing time: 64.680 sec [meth_main] Data output time: 4.090 sec [meth_main::WARNING] Skipped 380450 reads with MAPQ < 20. Use --min-mapq to change the threshold. [main] Version: 1.5 [main] CMD: f5c eventalign -b align_sorted.bam -g mpox.fasta -r raw.fastq.gz --slow5 merged.blow5 [main] Real time: 68.790 sec; CPU time: 65.246 sec; Peak RAM: 0.334 GB
Thank you again. Thank you for your patience and assistance.
Oh right, you cannot simply cat
any type of files like that. You can cat plain text files or .gz files, but SAM,BAM,SLOW5,BLOW5,VCF,BCF, FAST5,POD5.... you should use the appropriate tools to combine them.
So it seems like POD5->FAST5 converter may have caused some issue, in anycase, using BLOW5 input is much efficient than FAST5, so simply convert POD5->BLOW5 rather than POD5>FAST5.
Skipped 380450 reads with MAPQ < 20. Use --min-mapq to change the threshold.
this is a message that you can pay attention to. This means that the alignment (output from minimap2) had a lot of reads that are of very low mapping quality. If this is anticipated due to the quality of the reference or data, you can simply provide --min-mapq 0
to include everything including 0 quality alignments in the f5c output (or a threshold that is relevant to your problem). If such low-quality alignments are unexpected, suggest looking at the minimap2 flags, your reference and may be visualise through IGV to see if the mappings make sense.
From minimap2 getting started at https://github.com/lh3/minimap2?tab=readme-ov-file#getting-started, the suggested flags for direct-RNA mapping for a genome:
./minimap2 -ax splice -uf -k14 ref.fa reads.fa > aln.sam # noisy Nanopore Direct RNA-seq
Thank you for your help and guidance!
Yes, the warning message is anticipated,because my sequencing data is poor.I think I can only use these data to practice.
Excuse me again, have you encountered the same issue while using m6anet? I couldn't find the answer from the m6anet developer.If you haven't encountered the same problem, you don't need to answer me. When I further used m6anet for modification analysis, I encountered the following problem: 1.(m6anet) zzr@zzr-VirtualBox:~/shares/m6anet/m6a$ m6anet dataprep --eventalign eventalign.tsv --out_dir output --n_processes 4 I used the eventalign.tsv to run the dataprep.The run finished without error, but the data.log and data.json are empty.
2.(m6anet) zzr@zzr-VirtualBox:~/shares/m6anet/m6a$ m6anet inference --input_dir output --out_dir output2 --n_processes 4 --num_iterations 1000 IndexError: single positional indexer is out-of-bounds
Best wishes, laomer
I ran m6anet on an RNA004 samples sometime ago using f5c v1.4 and the commands I used were (f5c v1.5 would behave similarly):
f5c index reads.fastq --slow5 reads.blow5
samtools sort minimap_out.sam -o minimap_out.bam
samtools index minimap_out.bam
f5c eventalign --reads reads.fastq --bam minimap_out.bam -g gencode.v40.transcripts.fa --slow5 reads.blow5 --scale-events --signal-index --summary f5c_summary.tsv --threads 10 -o f5c.tsv --rna --kmer-model f5c/test/rna004-models/rna004.nucleotide.5mer.model # download from https://raw.githubusercontent.com/hasindu2008/f5c/refs/heads/master/test/rna004-models/rna004.nucleotide.5mer.model
m6anet dataprep --eventalign f5c.tsv --out_dir m6anet/ --n_processes 8
m6anet inference --input_dir m6anet/ --out_dir m6anet/ --pretrained_model HEK293T_RNA004 --n_processes 8 --num_iterations 1000
That executed without a problem.
Thank you very much! I ran the result I wanted using rna004.nucolipide.5mer.model. At this moment, I really want to embrace you tightly and give you my kiss. You are my god!
Best wishes, laomer
Hi,
I am trying to run eventalign on reads from RNA004 and I get the following error: [meth_main::13.6310.41] 0 Entries (0.0M bases) loaded [pthread_processor::13.6310.41] 0 Entries (0.0M bases) processed [meth_main] skipped unmapped: 0, skipped secondary: 0, skipped low_mapq: 380450 [meth_main] total entries: 6763, qc fail: 0, could not calibrate: 0, no alignment: 0, bad reads: 6763 [meth_main] total bases: 0.0 Mbases [meth_main] Data loading time: 12.492 sec [meth_main] - bam load time: 1.736 sec [meth_main] - fasta load time: 4.313 sec [meth_main] - fast5 load time: 6.340 sec [meth_main] - fast5 open time: 6.340 sec [meth_main] - fast5 read time: 0.000 sec [meth_main] Data processing time: 0.000 sec [meth_main] Data output time: 0.000 sec [meth_main::INFO] Performance bounded by file I/O. File I/O took 12.491 sec than processing [meth_main::WARNING] Skipped 380450 reads with MAPQ < 20. Use --min-mapq to change the threshold. [meth_main::WARNING] 6763 out of 6763 reads missing in FAST5/SLOW5. [meth_main::ERROR] all 6763 out of 6763 reads failed. Check the input files.
This is the command : f5c eventalign -b align_sorted.bam -g mpox.fasta -r raw.fastq.gz --pore rna004 --rna > eventalign.tsv
My file is fast5 converted from pod5. The version of f5c is 1.5. Could you give me some advice? I am a novice in this field.
Best wishes, laomer