hasindu2008 / f5c

Ultra-fast methylation calling and event alignment tool for nanopore sequencing data (supports CUDA acceleration)
https://hasindu2008.github.io/f5c/docs/overview
MIT License
143 stars 27 forks source link

f5c about kernel:NMI watchdog: BUG: soft lockup #185

Open kir1to455 opened 1 day ago

kir1to455 commented 1 day ago

Hi, @hasindu2008

When I use f5c to eventalign RNA004 data, the system popped up this bug and crashed. Here are a few screenshots.

image

Here is my code: for i in input; do echo $i mkdir -p $blow5_dir/${i} mkdir -p $Event_dir/${i} blue-crab p2s ${Pod5Dir}/${i}/pod5_pass -d $blow5_dir/${i} -t 30 -p 10 ### pod5 to slow5 slow5tools merge $blow5_dir/${i} -o ${blow5_dir}/${i}_sup.pass.blow5 -t 30 Bamfiles=(find $BamDir/${i} -name "${i}_merge_sup_chr*.sorted.bam") for bfile in ${Bamfiles[@]}; do echo $bfile bbase=basename $bfile .sorted.bam ${f5c_dir}/f5c index -t 40 ${FastqDir}/${i}/${bbase}.fastq --slow5 ${blow5_dir}/${i}_sup.merge.blow5 ${f5c_dir}/f5c eventalign --reads ${FastqDir}/${i}/${bbase}.fastq --bam $bfile --genome ${index_dir}/gencode.vM33.normal.transcripts.fa --slow5 ${blow5_dir}/${i}_sup.merge.blow5 -t 30 --kmer-model ${f5c_dir}/test/rna004-models/rna004.nucleotide.5mer.model --min-mapq 0 --secondary=no --rna --signal-index --scale-events --collapse-events --samples -B 14M -K 1024 --cuda-dev-id 0 --summary ${Event_dir}/${i}/${bbase}_nanopolish.summary.txt | pigz > ${Event_dir}/${i}/${bbase}.eventalign.tsv.gz ## --samples raw events --collapse-events done done ` I index the fastq file of the chromosome with merge blow5 before running f5c eventalign each time.

And my log file, I found the some reads were not found in file.

image

Is it not possible to use the split fastq files(chr1...chr2...chr3...) to index the integrated blow5 files?

Best wishes, Kirito

hasindu2008 commented 11 hours ago

Hello

What is the specification of your system? This kernel error message is unlikely to have anything to do with f5c. I have seen this error a few times on single-board computers running unstable operating systems when the system load is high.

Is there a reason you are using loops in your script? Are these multiple samples that you are iterating using i? If it is a single sample, what about

  1. merging all the BLOW5 files into one BLOW5
  2. combining all FASTQ into one file
  3. merging and sorting all BAM files into one file
  4. Then running a single f5c index followed by an f5c eventalign?

Again, is there a reason why you are going over chromosomes individually, rather than doing it at one?

If you want to keep the loop approach, I suggest at least the following.

  1. create one single BLOW5
  2. create one single FASTQ file
  3. Do the f5c index once
  4. Then iterate through the BAM files while calling f5c eventalign
kir1to455 commented 11 hours ago

Hi, @hasindu2008

My system is CentOS Linux 7. I only have two samples : input and ip.

merging all the BLOW5 files into one BLOW5 combining all FASTQ into one file merging and sorting all BAM files into one file Then running a single f5c index followed by an f5c eventalign

In fact, I have always used the first method and no bug. Excellent!

Again, is there a reason why you are going over chromosomes individually, rather than doing it at one?

The reason why I want to split the chromosome is because the eventalign file of RNA004 is too large (nearly 1T) , and I want to split it according to the chromosome. To split the chromosome will help me in machine learning.

The approach 1 is use split chromosome (like chr1.fq) mapping to chr1.bam, then run index with merge BLOW5 and run f5c eventalign.

The approach 2 is use whole.fq mapping to whole.bam and run f5c eventalign, then split the whole eventalign file to chrom.eventalign.tsv.

It looked like the first method might take less time, so I tried the first one. I'll try the second approach.

Best wishes, Kirito