isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
268 stars 48 forks source link

layer begin and end positions are invalid #125

Open utritala opened 5 years ago

utritala commented 5 years ago

Hello, I have got exactly sample error as issue #118 , that is

[racon::Window::add_layer] error: layer begin and end positions are invalid!`

I am using racon 1.3.3 and here is my command: racon -t 20 all_pass_reads_without_lambda.fasta minimap2_to_flye_sorted.sam assembly.fasta > assembly_racon_r1.fasta

I have checked and there are no duplicate contig names in the assembly like #118. Please can someone advise on how can I fix this error?

Many thanks, Urmi

rvaser commented 5 years ago

Hi Urmi, please verify that each contig name is unique up to the first white space.

Best regards, Robert

utritala commented 5 years ago

Hi Robert, Thanks for your prompt response. The contig names don't have any white spaces. For example:

>contig_1
>contig_10
>contig_100
>contig_10001
>contig_10002
>contig_10005
>contig_10007
>contig_10009
>contig_1001
>contig_10011

Can this be the issue?

Best regards, Urmi

rvaser commented 5 years ago

Nope, that looks alright. No idea where the problem is at this point. Could you share your data perhaps?

utritala commented 5 years ago

Sure. I can share a subset of it shortly.

rvaser commented 5 years ago

It would be marvelous if you could find a subset which exits with the same error :)

utritala commented 5 years ago

OK. I have tested the same command with a subset of the data (100 contigs, 100000 reads for polishing) and interestingly it doesn't produce this error anymore. The original data is quite huge: 24 million promethION reads. Can this be a problem?

rvaser commented 5 years ago

That amount of reads should not be a problem. Can you extract all contig headers and send this file to me via email?

utritala commented 5 years ago

Thanks Robert. OK will do.

utritala commented 5 years ago

Hi Robert, I have emailed you the links to some sample files as well as contig headers. Thanks, Urmi

rvaser commented 5 years ago

Thanks Urmi, I am currently looking at them, sorry for the delay.

utritala commented 5 years ago

No worries Robert. Thanks you so much for looking into it.

On Mon, Jul 8, 2019 at 9:22 AM Robert Vaser notifications@github.com wrote:

Thanks Urmi, I am currently looking at them, sorry for the delay.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/isovic/racon/issues/125?email_source=notifications&email_token=AMLNGFGAHKL5JAGPTODFNLLP6L2KRA5CNFSM4H5FZMU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZMK45A#issuecomment-509128308, or mute the thread https://github.com/notifications/unsubscribe-auth/AMLNGFA3NE3YUFEIVO2EF5DP6L2KRANCNFSM4H5FZMUQ .

rvaser commented 5 years ago

The contig headers are unique and completely different from those of reads, which should rule out the error that occurred in #118. I saw that you have some really short contigs (shorter than the window length) and thought maybe that is the problem, but a run testing that finished successfully. I am running racon now with your assembly and subset of reads you provided.

How long did it take for racon to reach the error? Which mapper did you use to generate the SAM file?

utritala commented 5 years ago

Good point about the short contigs. It took about 45 minutes to reach the error. I used minimap2 to generate the SAM file.

rvaser commented 5 years ago

Can you try generating a PAF file with minimap2 and rerun the polishing?

utritala commented 5 years ago

OK I can try that. I will let you know how I get on. Thanks again for your help.

rvaser commented 5 years ago

The subsample of reads you have sent me finished successfully. If the PAF file also exits with the same error, would you be up to modify racon code a bit so we can locate which contig/read is the culprit? You will just have to copy/paste a few lines and rerun the polishing.

utritala commented 5 years ago

Running the full dataset with PAF files worked like a charm. Thank you so much once again, Robert. :)

michieitel commented 4 years ago

Hi Robert!

Same issue for me even with the PAF file running on the GPU accelerated racon v1.4.3.

With this command ...

racon -t 20 -m 8 -x -6 -g -8 -w 500 -c 100 -b --cudaaligner-batches 100 \
reads.porechop.5kb.fasta.gz \
reads.porechop.5kb_minimap2.paf.gz \
FLYE_assembly-1_1kb.fasta \
> FLYE_assembly-1_1kb_racon_GPU.fasta \
2> FLYE_assembly-1_1kb_racon_GPU.log

(just if you are wonderimg: I used reads >1kb for the assembly but used only reads >5kb for the polishing since the 1kb reads maxed out the RAM and in fact the ~40x coverage with 5kb reads might be enough for polishing anyhow)

... I got the following error:

Using 2 GPU(s) to perform polishing Initialize device 0 Initialize device 1

[CUDAPolisher] Constructed. [racon::Polisher::initialize] loaded target sequences 0.451082 s [racon::Polisher::initialize] loaded sequences 168.910471 s [racon::Polisher::initialize] loaded overlaps 803.143136 s [racon::CUDAPolisher::initialize] aligning overlaps [=> ] 0.212609 s [racon::CUDAPolisher::initialize] aligning overlaps [==> ] 309.648201 s [racon::CUDAPolisher::initialize] aligning overlaps [===> ] 864.936096 s [racon::CUDAPolisher::initialize] aligning overlaps [====> ] 1343.068638 s [racon::CUDAPolisher::initialize] aligning overlaps [=====> ] 1820.772252 s [racon::CUDAPolisher::initialize] aligning overlaps [======> ] 2272.082065 s [racon::CUDAPolisher::initialize] aligning overlaps [=======> ] 2650.677926 s [racon::CUDAPolisher::initialize] aligning overlaps [========> ] 2972.167988 s [racon::CUDAPolisher::initialize] aligning overlaps [=========> ] 3301.861032 s [racon::CUDAPolisher::initialize] aligning overlaps [==========> ] 3630.955212 s [racon::CUDAPolisher::initialize] aligning overlaps [===========> ] 3966.076676 s [racon::CUDAPolisher::initialize] aligning overlaps [============> ] 4382.849504 s [racon::CUDAPolisher::initialize] aligning overlaps [=============> ] 5179.348250 s [racon::CUDAPolisher::initialize] aligning overlaps [==============> ] 5555.710522 s [racon::CUDAPolisher::initialize] aligning overlaps [===============> ] 5871.245550 s [racon::CUDAPolisher::initialize] aligning overlaps [================> ] 6191.455460 s [racon::CUDAPolisher::initialize] aligning overlaps [=================> ] 6523.840056 s [racon::CUDAPolisher::initialize] aligning overlaps [==================> ] 6865.439302 s [racon::CUDAPolisher::initialize] aligning overlaps [===================>] 7218.304939 s [racon::CUDAPolisher::initialize] aligning overlaps [====================] 7600.838001 s

[racon::Window::add_layer] error: layer begin and end positions are invalid!

Contigs are unique. Shortest read is 5kb. I guess you want to play with the (sub)data? How you want me to share it with you?

Best, Michael

rvaser commented 4 years ago

Hi Michael, could you first try v1.4.9 (latest commit) from https://github.com/lbcb-sci/racon? I had a similar bug in issue #140 which should be fixed. If that does still not work, let me know.

Best regards, Robert

michieitel commented 4 years ago

roger that. thanks

michieitel commented 4 years ago

same with v. 1.4.9:

Using 2 GPU(s) to perform polishing Initialize device 0 Initialize device 1 [CUDAPolisher] Constructed. [racon::Polisher::initialize] loaded target sequences 0.350697 s [racon::Polisher::initialize] loaded sequences 56.951875 s [racon::Polisher::initialize] loaded overlaps 174.475810 s [racon::CUDAPolisher::initialize] allocated memory on GPUs for alignment 0.136236 s [racon::CUDAPolisher::initialize] aligning overlaps [====================] 9220.406169 s [racon::Window::add_layer] error: layer begin and end positions are invalid!

rvaser commented 4 years ago

Can you maybe share your data? If not, please add before line https://github.com/lbcb-sci/racon/blob/master/src/window.cpp#L55 the following:

fprintf(stderr, "%u %u %u %u\n", begin, end, sequence_length, quality_length);

Recompile and let me know what it says. Thanks!

rvaser commented 4 years ago

I got an email with values 0 499 512 0 but don't see this reply here. Nevertheless, the numbers look fine and I am not sure what is wrong. Can you replace line https://github.com/lbcb-sci/racon/blob/master/src/cuda/cudapolisher.cpp#L88 with if (true), compile and run again? Not sure if the GPU alignment is the problem or not.

michieitel commented 4 years ago

Yes I deleted this comment again since I was again playing with sth. Let me finish the current test. If that fails I will replace the line as suggested...

michieitel commented 4 years ago

Well. How to say.... super embarrassing mistake on my side.

The alignment file was the problem. Instead of mapping reads to the assembly I mapped them against themselves. So stupid copy/paste error.

Now it finished without errors and super fast!

Thanks for your help Robert!

cheers Michael

rvaser commented 4 years ago

Haha, happens :)

Best regards Robert