isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
271 stars 49 forks source link

Racon doesn't work for me anymore: various error messages #66

Closed nataliering closed 5 years ago

nataliering commented 6 years ago

Hi,

I've used Racon a lot in the past, and even used it fairly recently (last couple of weeks) to test the new Illumina polishing ability. However, something seems to have changed and now it keeps failing.

First I kept getting the error message "empty target sequence set", when I definitely hadn't set an empty file to be polished. I tried getting rid of everything on the first line of the target file except ">", which seemed to clear up that issue, but Racon still didn't work.

Next, I got an error message saying "empty overlap set". Again, the .paf I used definitely was not empty; however, I had not regenerated it after changing the ">" line on the target file, so I re-ran minimap and created a new .paf.

Finally, I'm now getting another new error message, which says "[bioparser: :PafParser] error: invalid file format". I'm at a loss. I haven't changed anything on my system, as far as I'm aware, yet I'm getting error after error with Racon now :(

I reinstalled from Github and ran the racon_test script, which passed fine. But when I run commands with Racon which worked two weeks ago, they don't work now. I'm also trying to run Unicycler, which uses Racon, so that keeps failing too. Please help me figure out what's gone wrong?

Thanks for your time, Natalie

rvaser commented 6 years ago

Hello Natalie, please paste here the command your are using. As we changed the API a bit, I am not sure if Unicycler was adapted to these changes.

Best regards, Robert

nataliering commented 6 years ago

Hi Robert,

Thanks for the speedy reply, and sorry I should have included the command in my original question!

Here's an example of the command I've been using to run Racon alone: racon ../../UK36/fastq/UK36.all.porechopped.fastq UK36.abruijn.paf ../BP_UK36_abruijn.fasta > UK36.abruijn.racon1.fasta

And to generate the .paf I used minimap: minimap ../BP_UK36_abruijn.fasta ../../UK36/fastq/UK36.all.porechopped.fastq > UK36.abruijn.paf

rvaser commented 6 years ago

The commands look fine to me. The error indicates that something is wrong with the overlaps file. Check its size, first 10 (head -n 10) and last 10 lines (tail -n 10). You can paste them here as well!

nataliering commented 6 years ago

The size of the paf looks fine (35Mb), as do the first and last 10 lines (I've shown an example of one below, they all look the same, though with different numbers of course). I tried using minimap2 instead, in case minimap was the problem, but I'm getting the same (invalid file) error with the minimap2 paf as well. Could it be a problem with the parser?

7e61923c-5751-4545-bba8-08987aaa9774 769 28 759 + Bordetella 4114060 3326160 3326928 157 768 255 cm:i:19

rvaser commented 6 years ago

Is there an empty line at end? The new parser in racon (bioparser) is quite strict.

nataliering commented 6 years ago

I'm not sure...there isn't a blank line at the end when I use tail -n 10, but I used nano to go into the paf and there was potentially an extra line in there. I deleted that, and I'm now getting a different error message: [racon::Polisher::initialize] error: empty overlap set!

Some progress, I guess?!

rvaser commented 6 years ago

The new error means that you either swapped the target and query files, but you copied your commands here and they seem fine, or more likely that for each overlap at least one sequnce from the pair is missing in query/target file. I am not sure how though, both commands seem fine.

rvaser commented 6 years ago

If you feel comfortable sharing your data, you can send me your paf file and first 10 sequence names (without the actual sequences) from both query and target file via email and I'll check what is wrong.

nataliering commented 6 years ago

Thank you - I've tried Racon with a few of my other assemblies and it has worked with most of them, so I think it is definitely a sporadic problem, most likely originating from something I'm doing during processing (i.e. not a Racon-specific issue). For now, I can work around the problematic assemblies, and save us both the trouble of figuring out what's wrong! If I come across the problem again in the future, I'll let you know. Thanks for your help :)

Kirk3gaard commented 5 years ago

I have managed to provoke racon to generate the same error "error: empty target sequences set!" Adding a line end at the of the fasta sequence seemed to fix it for me.

I have made a zip folder with some ecoli data that can reproduce the problem with racon 1.3.1. https://www.dropbox.com/s/99vrjtyte1qzo7b/racon1.3.1_error.zip?dl=0

rvaser commented 5 years ago

Thank you a lot Rasmus, I was able to locate the bug! Will update racon today.

Best regards, Robert

rvaser commented 5 years ago

Should be fixed in version 1.3.2 :)

Kirk3gaard commented 5 years ago

That was fast! Thanks for a great tool :)