Open Lucas-Servi opened 1 year ago
@Lucas-Servi Did you ever get this working? This is what I have done so far. I don't think you are converting fast5 files correctly. Tombo requires single-fast5 files (one read per file, but multiple files) so you must use the multi_to_single fast5 command. However, you are using single_to_multi_fast5 (we don't want multi-fast5).
./guppy_basecaller --input_path /path/to/fast5_dir \
--save_path /path/to/fastq_dir \
--config rna_r9.4.1_70bps_hac.cfg \
--compress_fastq \
--records_per_fastq 0 \
--disable_pings \
--device cuda:all \
--recursive \
--chunks_per_runner 1024 \
--min_qscore 10
Run multi_to_single_fast5. Guppy output is currently in multi-fast5 format (default) and we want single-fast5.
multi_to_single_fast5 --input_path /path/to/fast5_dir \
--save_path /path/to/single_fast5_dir \
--threads 16 \
--recursive
# Merge FASTQ files
cd /path/to/fastq_dir/pass
cat fastq_runid_*fastq.gz > sample1_pass.fastq.gz
gunzip sample1_pass.fastq.gz
cd /path/to/fastq/fail
cat fastq_runid_*fastq.gz > sample1_fail.fastq.gz
gunzip sample1_fail.fastq.gz
Fix the sequencing_summary file
#!/usr/bin/python3
with open('../fastq/sequencing_summary.txt') as file, open('../single_fast5_dir/sequencing_summary_fix.txt', 'w') as outfile:
header = next(file)
outfile.write(header)
for line in file:
line = line.split()
line[0] = f"{line[1]}.fast5"
line.append('\n')
outfile.write("\t".join(line))
Run tombo preprocess to add basecalls from fastq file to single-fast5 files.
# Set variables
pass=/path/to/fastq_dir/pass/sample1_pass.fastq
fail=/path/to/fastq_dir/fail/sample1_fail.fastq
# Add basecalled sequence from FASTQs to single-FAST5s
tombo preprocess annotate_raw_with_fastqs --fast5-basedir /path/to/single_fast5_dir \
--fastq-filenames $pass $fail \
--overwrite \
--sequencing-summary-filenames /path/to/single_fast5/sequencing_summary_fix.txt \
--processes 50
tombo resquiggle --rna \
--q-score 10 \
--processes 5 \
--num-most-common-errors 5 \
--ignore-read-locks \
--overwrite \
--threads-per-process 50 \
/path/to/single_fast5_dir \
/references/mouse/Mus_musculus.GRCm39.cdna.all.fa
Hi, I'm trying to perform a m6a (on RNA) detection using tombo (and some other softwares). I'm having trouble performing the resquiggle, I've read many of the other opened issues and I can't find a way for it to work.
Here is what I tried: I started with the fast5 files from the Mk1c nanopore equipment of an RNA-Direct Sequencing, these are multiple files containing 4000 (by default) signals.
multi_to_single_fast5 --input_path /.../fast5_total/ --save_path /.../fast5_single_total --recursive -t 12
On the single fast5 files I basecalled using megalodon
After the basecalling was done I used the
tombo preprocess annotate_raw_with_fastqs
Then the single fast5 files was created (using
--batch_size 900000
)Finally the resquiggle
which throws the following message
I have tried using
tombo preprocess annotate_raw_with_fastqs
again on the big multi fast5 file, and still doesn't work:I'm really missing which is my mistake on the pipeline. I even tried this using a genome (rather than a transcriptome) and using guppy (rather than megalodon).
Perhaps something from the fast5 files I'm starting with? I did a regular basecall+mapping and the data looks perfect on the IGV
I would really appreciate your help.
Thank you