Raw data is not stored in Raw/Reads/Read_[read#] so new segments cannot be identified

haotianteng / Chiron

A basecaller for Oxford Nanopore Technologies' sequencers

Other

122 stars 53 forks source link

Raw data is not stored in Raw/Reads/Read_[read#] so new segments cannot be identified #85

Closed pdimens closed 4 years ago

pdimens commented 5 years ago

when running

python chiron/utils/raw.py --input /mnt/tertiary/nanopore_YFT --output testxport --mode dna

on straight-from-sequencing fast5 files, output is:

Raw data is not stored in Raw/Reads/Read_[read#] so new segments cannot be identified.
FAIL on /mnt/tertiary/nanopore_YFT/FAK62603_510c64886451cfe03d269b454b4e20c60cf8f4b2_661.fast5 file.

Is something wrong with my files? The files are 100% unmodified from their creation by MinION/MinKnow.

haotianteng commented 5 years ago

okay, so it seems they changed the location of the raw signal. could you upload a sample fast5 file so I can have a look? Thanks

pdimens commented 5 years ago

Here is a sample file. Hopefully the link works: https://mega.nz/#!iJRBDISD!vUxCX7ljiHt8KPqKPcXjQZsLhuOg7APqjLnItEVk42s

haotianteng commented 5 years ago

Okay I got the file, so multiple reads are being merged into one fast5 file. I will update the input pipeline.

Are you gonna train your own basecaller or just want to basecall it? For basecalling, using utils/extract_sig_ref.py For training, the reads have to be basecalled and labelled before calling raw.py, you can label the reads by using Tombo or chiron_label.py

haotianteng commented 4 years ago

Hi, the input pipeline has been updated for the multiple reads file, if you encounter any problem please let me know, thank you.