isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
271 stars 49 forks source link

stuck at [racon::Polisher::polish] with half of consensus windows generated #58

Closed kfergy closed 6 years ago

kfergy commented 6 years ago

Hey there,

Long story short, I have a hybrid assembly build with the first two steps of DBG2OLC, but couldn't install blasr, POS. As suggested here, I turned to racon for consensus calling. I first used minimap2 to create a PAF (483 MB) from the backbone.fasta (222 MB) and the input PacBio.fasta (13.5 GB). Note, these are sequel reads so I'm using FASTA files.

Now, I'm at the racon stage, using the following command: racon -t 8 "/usr/path/pbTricho.fasta" "/usr/path/TrichoHybrid-approx-mapping.paf" "/usr/path/backbone_raw.fasta"

This works for the first 30 minutes,

[racon::Polisher::initialize] loaded target sequences [racon::Polisher::initialize] loaded sequences [racon::Polisher::initialize] loaded batch of overlaps [racon::Polisher::initialize] aligned overlap 2060216/2060216 [racon::Polisher::initialize] transformed data into windows

but then the program is hung up on the following section

[racon::Polisher::polish] generated consensus for window 118876/467058

At this point, top indicates that racon is sleeping, still using computing power and memory, but going nowhere. I've tried altering the quality check with -q 0 just to make sure that isn't an issue, but it still gets stuck. So my question is, how do I get it to continue - is this a FASTA issue, or something else going off?

thx

rvaser commented 6 years ago

Hello, I guess one thread entered an infinite loop due to a never before seen bug (on window 118876). Can you please modify src/window.cpp as described bellow and check whether the program hungs up again plus send me the output? Alternatively, you can send me your whole data and tell me which version of Racon you are using.

Best regards, Robert

src/window.cpp modification at line 67:

    if (id_ != 118876) return false;
    for (uint32_t i = 0; i < sequences_.size(); ++i) {
        printf("%s\n%s\n%u %u\n",
            std::string(sequences_[i].first, sequences_[i].second).c_str(),
            std::string(qualities_[i].first, qualities_[i].second).c_str(),
            positions_[i].first, positions_[i].second);
    }

Don't forget to recompile before running again!

kfergy commented 6 years ago

I'm using racon v1.0.1. Some clarification - with each rerun, racon gets stuck at a different window in this part of the process. Here is may latest iteration

[racon::Polisher::initialize] loaded target sequences [racon::Polisher::initialize] loaded sequences [racon::Polisher::initialize] loaded batch of overlaps [racon::Polisher::initialize] aligned overlap 2060216/2060216 [racon::Polisher::initialize] transformed data into windows [racon::Polisher::polish] generated consensus for window 6691/467058

With this in mind, does the change to src/window.cpp make sense? In the meantime, I'll also send you my files via ftp in case.

Thanks for the quick reply!

rvaser commented 6 years ago

Hmm that is quite odd. The file change only makes sense if a certain window is the culprit and not random ones. Please do send me your files and I'll check what is wrong locally.

rvaser commented 6 years ago

I have obtained the files and am running racon. Will get back to you hopefully soon.

rvaser commented 6 years ago

I found the bug! One of your reads had an insert of size 5kbp into a window of 500bp on the reference (maybe something is misassembled or the sequencer is at fault) which exceeded my expectation of maximal sequence length added to a window (i.e. I have a dummy quality string prepared for FASTA inputs of fixed size). Will fix it soon.

rvaser commented 6 years ago

Fixed in release 1.0.2!