nanoporetech / isONclust2

A tool for de novo clustering of long transcriptomic reads
Other
14 stars 3 forks source link

Error in Clustering mode: Invalid clustering mode: 3 #8

Open CWYuan08 opened 1 year ago

CWYuan08 commented 1 year ago

Hi, I am trying to run isONclust2 first for isONcorrect, but I got this error for all the batches, one example: Loaded input batch from batches/isONbatch_9.cer: Batch number: 9 Batch range: [244492,273799] Depth: -1 Nr sequences: 29308 Nr bases: 50001212 Nr clusters: 29308 Nr nontrivial clusters: 0 Minimizers in database: 0 Created pseudo-batch for single clustering: Batch number: -9 Batch range: [244492,273799] Depth: -1 Nr sequences: 29308 Nr bases: 0 Nr clusters: 29308 Nr nontrivial clusters: 0 Minimizers in database: 0 Resetting input clusters. Clustering mode: Invalid clustering mode: 3

from running: for f in batches/isONbatch_.cer; do filename=$(basename "$f") output="clustered/${filename%.}.cer" isONclust2 cluster -v -l "$f" -o "$output" done

could you please advise what I need to fix?

Many thanks!!

Best, CW

ksahlin commented 1 year ago

Hi @CWYuan08,

Since I was the one referring you here.. This seems to be the way to run isONclust2: https://github.com/epi2me-labs/wf-transcriptomes (and this section in particular: https://github.com/epi2me-labs/wf-isoforms#de-novo-based-approach-experimental)

You can then run isONcorrect on the clustered output, and isONform for consensus. I have not tried the approach they listed here, but they say it is experimental, which typically means no substantial benchmarks have been done.

Best, K

Johnsonzcode commented 1 year ago

Hi @ksahlin But de-novo-based-approach-experimental cannot be runned on command line mode. You can see here. So maybe this pipeline isONclust-isONcorrect-isONform is the only way ?

cjw85 commented 1 year ago

(Thanks @ksahlin for adding some comments here).

@Johnsonzcode the de-novo based approaches are indeed still largely experimental and so the code is not well-maintained. This project is not currently maintained and there is no one at Oxford Nanopore Technologies currently studying de-novo approaches. I dare say that @ksahlin is far more of an expert in the space than we are.

Johnsonzcode commented 1 year ago

So How could I get non-redundant isoform from ONT full-length transcripts. Is there some pipeline suggested ?

Johnsonzcode commented 1 year ago

(Thanks @ksahlin for adding some comments here).

@Johnsonzcode the de-novo based approaches are indeed still largely experimental and so the code is not well-maintained. This project is not currently maintained and there is no one at Oxford Nanopore Technologies currently studying de-novo approaches. I dare say that @ksahlin is far more of an expert in the space than we are.

But how could I sovle the error as mentioned? Or is there some pipeline suggested to get non-redundant isoform from ONT full-length transcriptome sequencing ?

ksahlin commented 1 year ago

Or is there some pipeline suggested to get non-redundant isoform from ONT full-length transcriptome sequencing ?

I can suggest running pychopper-isONclust-isONcorrect-isONform for this. The problen is that isONclust does not scale to very large datasets. This is what @CWYuan08 noticed and, hence, we ended up here looking for isONclust2 to replace isONclust as a solution. Another way is to manually batch (i.e. split) your large dataset to independent instances that isONclust can run on.

Johnsonzcode commented 1 year ago

Or is there some pipeline suggested to get non-redundant isoform from ONT full-length transcriptome sequencing ?

I can suggest running pychopper-isONclust-isONcorrect-isONform for this. The problen is that isONclust does not scale to very large datasets. This is what @CWYuan08 noticed and, hence, we ended up here looking for isONclust2 to replace isONclust as a solution. Another way is to manually batch (i.e. split) your large dataset to independent instances that isONclust can run on.

This pipeline may work.

cjw85 commented 1 year ago

@Johnsonzcode

The https://github.com/epi2me-labs/wf-isoforms pipeline is deprecated and its functionality is folded in to wf-transcriptomes. If you wish to use the de-novo route through wf-transcriptomes, lets work to uncover the bug you are seeing with its use on the issue you have already started over there. I feel we've gone a bit off topic from @CWYuan08's original post here.

CWYuan08 commented 1 year ago

Dear @Johnsonzcode, @cjw85, @ksahlin, thank you all for the useful discussions here, this is what I would like to ask and follow too! Best, CW