hasindu2008 / slow5tools

Slow5tools is a toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.
https://hasindu2008.github.io/slow5tools
MIT License
96 stars 6 forks source link

Basecall with Guppy #82

Closed SziKayLeung closed 2 years ago

SziKayLeung commented 2 years ago

Hello @hasindu2008,

Thank you for developing slow5tools - it's been really useful to compress and store ONT data!

Apologies if this is a naïve question but I thought it was possible to basecall blow5 files using Guppy. I have successfully converted my fast5 files to slow5, and am trying to basecall, however I get the following error:

2022-08-01 18:11:09.698375 [guppy/message] ONT Guppy basecalling software version 6.2.1+6588110, minimap2 version 2.22-r1101
config file:        /gpfs/mrc0/projects/Research_Project-MRC148213/sl693/softwares/ont-guppy-cpu/data/dna_r9.4.1_450bps_modbases_5mc_cg_hac.cfg
model file:         /gpfs/mrc0/projects/Research_Project-MRC148213/sl693/softwares/ont-guppy-cpu/data/template_r9.4.1_450bps_hac.jsn
input path:         /gpfs/mrc0/projects/Research_Project-MRC148213/sl693/ONT/Mouse_Whole_Genome/1_raw/20200807_1632_MC-110214_0_add313_506ffc5b/barcode10/blow5
save path:          /gpfs/mrc0/projects/Research_Project-MRC148213/sl693/ONT/Mouse_Whole_Genome/2_basecalled/20200807_1632_MC-110214_0_add313_506ffc5b/barcode10
chunk size:         2000
chunks per runner:  256
minimum qscore:     9
records per file:   4000
num basecallers:    1
cpu mode:           ON
threads per caller: 4

alignment file:     /gpfs/mrc0/projects/Research_Project-MRC148213/sl693/reference_2019/mm10.fa
alignment type:     auto

Use of this software is permitted solely under the terms of the end user license agreement (EULA).By running, copying or accessing this software, you are demonstrating your acceptance of the EULA.
The EULA may be found in /gpfs/mrc0/projects/Research_Project-MRC148213/sl693/softwares/ont-guppy-cpu/bin
2022-08-01 18:11:09.698970 [guppy/info] crashpad_handler not supported on this platform.
2022-08-01 18:13:19.554000 [guppy/message] Full alignment will be performed.
2022-08-01 18:13:19.595740 [guppy/message] Found 0 fast5 files to process.
2022-08-01 18:13:19.596981 [guppy/message] Init time: 129876 ms
2022-08-01 18:13:19.697374 [guppy/message] Caller time: 100 ms, Samples called: 0, samples/s: 0
2022-08-01 18:13:19.697410 [guppy/message] Finishing up any open output files.
2022-08-01 18:13:19.779375 [guppy/message] Basecalling completed successfully.

With the commands (slow5tools 0.5.1):

slow5tools f2s <input_dir> -d <output_dir> -p 8
guppy_basecaller -i <output_dir> -s <output_basecalled_dir> -c dna_r9.4.1_450bps_modbases_5mc_cg_hac.cfg --bam_out --recursive --align_ref mm10.fasta

Am I missing something/argument or is basecall only possible with the original raw fast5 files?

Thank you, Szi Kay

Psy-Fer commented 2 years ago

Hello Szi,

We have modified guppy to take slow5, however ONT don't allow us to make this publicly available.

We have both a bonito and dorado version which will basecall slow5. You can find them as closed pull requests on the ont repos of those 2 tools, where ONT also refused to integrate slow5.

Let me know if you need help finding those pull requests if you want to go that way. Otherwise, I only think we can share the guppy builds if you sign the ont developer agreement (which I can't really give advice on if that is worth it or not for you).

Kind regards, James

hasindu2008 commented 2 years ago

To add to what @Psy-Fer said, we have created a binary build of the slow5 version of Dorado https://github.com/hiruna72/dorado/releases/tag/v0.0.1. You could try that. Instructions are there.

Alternatively, you could use s2f to convert back to FAST5 into a temporary directory and use Guppy on it.

The bonito pull request is here: https://github.com/nanoporetech/bonito/pull/252

The Dorado pull request is here: https://github.com/nanoporetech/dorado/pull/19

SziKayLeung commented 2 years ago

Thank you Jamies and Hasindu. That's really helpful to know, and will try your Bonito/Dorado versions.

Psy-Fer commented 2 years ago

@hasindu2008 also had a crazy idea last night thinking about this.

So stay tuned for updates (if it works)

lacoak21 commented 2 years ago

I am also currently dealing with this same situation! The slow5 format looks excellent and im very excited to try this in our workflow. I am setting it up on a gridion and hoping to use the blow5 with guppy.

Wouldnt converting blow5 back to fast5 defeat the purpose computationally? (aside from less space)

Thanks for this awesome tool @Psy-Fer, @hasindu2008, @SziKayLeung, and team

hasindu2008 commented 2 years ago

@lacoak21 You are right. If we are directly basecall from S/BLOW5, it is much faster. But unfortunately, Guppy is closed source and despite us having a version of slow5-Guppy through the developer agreement, the terms of the agreement do not allow us to release it.
@Psy-Fer is doing a workaround for this, will let you know the outcome soon. In the long run however, given that ONT has announced that their Dorado opensource basecaller is going to replace guppy as the mid-term plan, we will release our own forked version of Dorado with S/BLOW5support.

However, even if you convert to fast5, still having S/BLOW5would be beneficial not just in terms of space, but the possibility of running other community-developed tools such as nanopolish, f5c, etc, a magnitude of times faster than using fast5. If there are any community-developed tools that you use and want us to have a look into supporting S/BLOW5, let us know.

Also, community developers could focus on the actual research problem rather than wasting 2/3rd of their effort/time on understanding and dealing with complex, idiosyncrasies and ad-hocness of FAST5. S/BLOW5is also about the human-efficiency and not just compute efficiency. Here is a post I wrote about the design philosophy of S/BLOW5. https://hasindu2008.github.io/slow5specs/design.html

lacoak21 commented 2 years ago

Thanks so much for your response @hasindu2008!! I had not realized that ONT is moving to Dorado.

After reading the design philosophy, this clearly will clearly help us a lot in the long run.

Psy-Fer commented 2 years ago

Dorado is still in "preview" release and isn't yet feature complete. So probably won't be till next year when that is at a stage to be production ready. So guppy is still very much the way to go.

In some good news, I have a prototype that solves this issue with slow5 not working with guppy, and us not able to share our slow5 compatible version of guppy.

It's still a little rough and needs a little testing and benchmarking, but it should be ready for an alpha release next week for you to try if you are keen.

Stay tuned! James

Psy-Fer commented 2 years ago

Hey,

give this a try

https://github.com/Psy-Fer/buttery-eel

Basecalling with guppy.

Thanks for this issue, it helped us think about other ways to go about this.

Have fun James

lacoak21 commented 2 years ago

Wow thanks James and Team! This is an exciting update.

Luisa

SziKayLeung commented 2 years ago

Thank you so much James et al. Will try this out and let you know how I get on!