nf-core / taxprofiler

Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data
https://nf-co.re/taxprofiler
MIT License
117 stars 33 forks source link

Add alternative long-read (nanopore) preprocessing tools #145

Closed sofstam closed 1 month ago

sofstam commented 1 year ago

Description of feature

Since Porechop is no longer supported, it is maybe useful to investigate Porechop_ABI as alternative.

Or https://github.com/epi2me-labs/pychopper

jfy133 commented 1 year ago

Would need to investigate putting it on conda...

sofstam commented 1 year ago

https://anaconda.org/bioconda/porechop_abi

There is a conda environment, I think that this can be added in the next release or what do you think?

jfy133 commented 1 year ago

Oh my bad, I misunderstood their installation instructions 🤦‍♀️

This is fine for inclusion for first release while we wait for final Bracken/KrakenUniq!

sofstam commented 1 year ago

https://github.com/bonsai-team/Porechop_ABI/issues/6

sofstam commented 1 year ago

I was thinking if we should only support porechop_abi and drop porechop at all.

jfy133 commented 1 year ago

Is it equivalent? I don't have any feeling either way as I don't use the data, so I'm happy let you make the call :)

sofstam commented 1 year ago

According to their github page: Note that Porechop_ABI is not designed to handle barcoded sequences adapters. Demultiplexing should be done using standard Porechop commands or other appropriate tools. It is not equivalent as it does not perform demultiplexing. However, demultiplexing is supported by Guppy and it is currenly preferred. @Midnighter have you worked with Nanopore data?

jfy133 commented 1 year ago

that's fine, we don't suppport demultiplxiing either

qbonenfant commented 1 year ago

Hi, i'm the main developper of porechop_abi. If you have any question on the project, feel free to ask me directly, i will be glad to answer.

Regarding the "equivalent or not" part:

We "only" added a step between the adapter database object creation and adapter ressearch in the reads. porechop_abi

The code base of porechop was left as original as possible, and all commands that used to run on porechop are unchanged. The behaviors of the two softwares are identical, as long as you don't use -abi or -go options. What porechop can do, porechop_abi can too. The part we can't handle (at least for now) is inferring barcoded adapters sequences from the reads only (hence the quoted sentence in your previous comment).

What could make a difference is the name of the executable. We had to change it to avoid installation conflicts and it may need to be changed in pipelines if you want to use our version.

TL;DR: It can work the same, but the name is different.

On the Demultiplexing part: Using porechop is prety much obsolete for demultiplexing. Even the dedicated tool (Deepbinner )developped by Ryan Wick (original author of porechop) is now deemed too old. Guppy (Nanopore basecaller) seems to be the "current standard" for demultiplexing.

Midnighter commented 1 year ago

However, demultiplexing is supported by Guppy and it is currenly preferred. @Midnighter have you worked with Nanopore data?

I've only worked with FASTQ files that are the result of running guppy so far. I agree with @jfy133 that we should not include demultiplexing in this pipeline. We don't do it for short reads either.

sofstam commented 1 year ago

I've only worked with FASTQ files that are the result of running guppy so far. I agree with @jfy133 that we should not include demultiplexing in this pipeline. We don't do it for short reads either.

I agree, I was just trying to list the differences between porechop and porechop_abi , not adding demultiplexing steps.

sofstam commented 1 year ago

@qbonenfant Thank you for the detailed description 👍

jfy133 commented 1 year ago

Or https://github.com/epi2me-labs/pychopper

jfy133 commented 1 year ago

Or https://github.com/wdecoster/chopper

LilyAnderssonLee commented 1 year ago

It seems that Pychopper supports ONT long-read sequencing, whereas chopper is compatible with both PacBio and ONT sequences. Therefore, it might be more advantageous to utilize chopper in this context.

sofstam commented 9 months ago

Pychopper is for cDNA reads so not useful for us.

I will have to test first but to my understanding, we can use porechop_abias optional step for adapter trimming and drop porechop. Regarding chopper, it can be added as an alternative to filtlong.

What do you think?

LilyAnderssonLee commented 1 month ago

So I am going to add porechop_abi to taxprofiler now as an alternative tool for adapter trimming of long reads