Open kir1to455 opened 2 weeks ago
Hello
What is the specification of your system? This kernel error message is unlikely to have anything to do with f5c. I have seen this error a few times on single-board computers running unstable operating systems when the system load is high.
Is there a reason you are using loops in your script? Are these multiple samples that you are iterating using i
?
If it is a single sample, what about
Again, is there a reason why you are going over chromosomes individually, rather than doing it at one?
If you want to keep the loop approach, I suggest at least the following.
Hi, @hasindu2008
My system is CentOS Linux 7. I only have two samples : input and ip.
merging all the BLOW5 files into one BLOW5 combining all FASTQ into one file merging and sorting all BAM files into one file Then running a single f5c index followed by an f5c eventalign
In fact, I have always used the first method and no bug. Excellent!
Again, is there a reason why you are going over chromosomes individually, rather than doing it at one?
The reason why I want to split the chromosome is because the eventalign file of RNA004 is too large (nearly 1T) , and I want to split it according to the chromosome. To split the chromosome will help me in machine learning.
The approach 1 is use split chromosome (like chr1.fq) mapping to chr1.bam, then run index with merge BLOW5 and run f5c eventalign.
The approach 2 is use whole.fq mapping to whole.bam and run f5c eventalign, then split the whole eventalign file to chrom.eventalign.tsv.
It looked like the first method might take less time, so I tried the first one. I'll try the second approach.
Best wishes, Kirito
Hello,
Sorry, it wasn't very clear to me and I could have misunderstood what you told me.
-w
option in f5c to provide a region. You could use this -w
to loop through (let us call this approach 3)
for chr in chr1 chr2 chr3 chr 4
do
f5c eventalign --reads merged.fastq --bam merged.bam -genome gencode.vM33.normal.transcripts.fa --slow5 merged.blow5 -t 30 --kmer-model rna004.nucleotide.5mer.model --min-mapq 0 --secondary=no --rna --signal-index --scale-events --collapse-events --samples -B 14M -K 1024 --cuda-dev-id 0 --summary ${chr}_nanopolish.summary.txt -w ${chr} | pigz > ${chr}.eventalign.tsv.gz
done
Hi, @hasindu2008
Are you saying that the "NMI" system error comes only when you run the code with loops you shared and not when you run without loops on a merged file?
Yes, when I split chromosome to chr1...chrM, not all chromosomes will have eventalign errors. Sometimes may be Chr5 or Chr 7 and so on will popped up this bug and crashed. I think this may be because the f5c thread is not finding the read id when looking for it, causing the driver to soft lock.
Approach 3 is a nice approach. However, my fastq files were mapping to transcriptome alignments. I split the bam file of the transcriptome according to the correspondence between the gencode transcripts and chromosomes. -w
option should only be used for genome alignments.
Ah, I see. But how it would be possible that the read ID is in the BAM file, but not in the corresponding FASTQ file? There are cases when a read is in the FASTQ, but not in BAM. However, if you aligned a FASTQ file that created a BAM, there should be NO new readIDs in the BAM. How did you generate these BAM files?
Hi, @hasindu2008
I used minimap2 to align fastq mapping to the transcriptome bam file.
Then I split the transcriptome bam file into chr.sorted.bam.
Final, I used samtools fastq to convert the chr.sorted.bam files into chr.fastq files and run index with merge.blow5.
Here is my code :
${DoradoDir}/dorado aligner --mm2-opts "-k 15 -w 10 --secondary=no" -t 50 ${indexDir}/gencode.vM33.normal.transcripts.fa ${FastqDir}/input_merge_sup.fastq > ${BamDir}/input_merge.vM33.transcriptome.dorado.bam
samtools sort -@ 40 ${BamDir}/input_merge.vM33.transcriptome.dorado.bam > ${BamDir}/input_merge.vM33.transcriptome.dorado.sorted.bam
python /public2/hjliang/Nanopore/script/transbam_to_chr_bam.py ${bamdir}/input_merge.vM33.transcriptome.dorado.sorted.bam
Bamfiles=(find ${split_bamdir}/input -name "input_merge_sup_chr*.bam")
for bfile in ${Bamfiles[@]}; do
echo $bfile
chr=basename $bfile .sorted.bam
samtools sort -@ 30 $bfile -o ${chr}.sorted.bam
samtools index -@ 30 ${chr}.sorted.bam
samtools fastq -@ 40 ${chr}.sorted.bam > ${chr}.fastq
${f5c_dir}/f5c index -t 40 ${chr}.fastq --slow5 ${blow5_dir}/input_sup.merge.blow5
done
Does samtools fastq here generate a different read id from the bam file?
samtools fastq does not mess up the readIDs, but the use of samtools fastq here is not suitable as there could be supplementary alignments that can make duplicate entries in FASTQ, and also hard clipping-related issues. You can instead give the merged FASTQ file to f5c. Also, in the above commands, I do not see a minimap2
command. Could you try using actual minimap2 rather than dorado-aligner and f5c on the whole FASTQ file in the following fashion?
minimap2 -ax map-ont --secondary=no -t 50 -uf gencode.vM33.normal.transcripts.fa input_merge_sup.fastq | samtools sort -@ 40 - -o input_merge.vM33.transcriptome.bam
f5c index -t 40 input_merge_sup.fastq--slow5 input_sup.merge.blow5
python transbam_to_chr_bam.py input_merge.vM33.transcriptome.dorado.sorted.bam
Bamfiles=(find ${split_bamdir}/input -name "input_merge_sup_chr*.bam")
for bfile in ${Bamfiles[@]}; do
echo $bfile
chr=basename $bfile .sorted.bam
samtools sort -@ 30 $bfile -o ${chr}.sorted.bam
samtools index -@ 30 ${chr}.sorted.bam
f5c eventalign --reads input_merge_sup.fastq --bam ${chr}.sorted.bam --genome gencode.vM33.normal.transcripts.fa --slow5 input_sup.merge.blow5 -t 30 --kmer-model rna004.nucleotide.5mer.model --min-mapq 0 --secondary=no --rna --signal-index --scale-events --collapse-events --samples -B 14M -K 1024 --cuda-dev-id 0 --summary ${chr}_nanopolish.summary.txt | pigz > ${chr}.eventalign.tsv.gz
done
Hi, @hasindu2008
When I use f5c to eventalign RNA004 data, the system popped up this bug and crashed. Here are a few screenshots.
Here is my code:
for i in input; do
echo $i
mkdir -p $blow5_dir/${i}
mkdir -p $Event_dir/${i}
blue-crab p2s ${Pod5Dir}/${i}/pod5_pass -d $blow5_dir/${i} -t 30 -p 10 ### pod5 to slow5
slow5tools merge $blow5_dir/${i} -o ${blow5_dir}/${i}_sup.pass.blow5 -t 30
Bamfiles=(
find $BamDir/${i} -name "${i}_merge_sup_chr*.sorted.bam")
for bfile in ${Bamfiles[@]}; do
echo $bfile
bbase=basename $bfile .sorted.bam
${f5c_dir}/f5c index -t 40 ${FastqDir}/${i}/${bbase}.fastq --slow5 ${blow5_dir}/${i}_sup.merge.blow5
${f5c_dir}/f5c eventalign --reads ${FastqDir}/${i}/${bbase}.fastq --bam $bfile --genome
${index_dir}/gencode.vM33.normal.transcripts.fa --slow5 ${blow5_dir}/${i}_sup.merge.blow5 -t 30 --kmer-model ${f5c_dir}/test/rna004-models/rna004.nucleotide.5mer.model --min-mapq 0 --secondary=no --rna --signal-index --scale-events --collapse-events --samples -B 14M -K 1024 --cuda-dev-id 0 --summary ${Event_dir}/${i}/${bbase}_nanopolish.summary.txt | pigz > ${Event_dir}/${i}/${bbase}.eventalign.tsv.gz ## --samples raw events --collapse-events
done
done
` I index the fastq file of the chromosome with merge blow5 before running f5c eventalign each time.And my log file, I found the some reads were not found in file.
Is it not possible to use the split fastq files(chr1...chr2...chr3...) to index the integrated blow5 files?
Best wishes, Kirito