Closed LeeBergstrand closed 4 months ago
@jmtsuji In what situation would end repair fail? Would it fail if it's linear? Can a contig be circularized without end repair?
Good point -- the user should be notified of the option to change the flag. Will work on this when I get the chance.
In what situation would end repair fail? Would it fail if it's linear? Can a contig be circularized without end repair?
Flye already reports whether its assembled contigs are circular or linear. However, Flye can make indel errors around the ends of circular contigs (e.g., the two ends might have a gap of 50 bp or something between them). To try to fix those errors, the end repair script tries to assemble a short "stitch contig" that spans the two ends of each supposedly circular input contig. It then uses a module of circlator
to try to match the "stitch contig" onto the two ends of the input contig, and if a match is found, the circlator
module replaces ends of the input contig with the "stitch contig".
If Flye says the contig is circular but end repair cannot build a stitch contig that spans the two ends of the contig, then the end repair script fails by default. (If Flye says the contig is linear, then the end repair script skips that contig, i.e., the contig just passes through without any error message.) Setting keep_unrepaired_contigs
to True means that a contig Flye says is circular will be passed through the end repair script as-is (rather than the script failing) even if end repair can't manage to stitch the contig ends. That contig might be in good enough shape to get circularized properly during polishing downstream, but if it has a large indel, in my experience this cannot get fixed by polishing.
I think the current version of the end repair script is too picky right now with matching up the contig ends with the stitch contig, and I want to address this in the custom code I write (stitch.py
) that will eventually replace the circlator code as per rotary-genomics/rotary-utils#8 and rotary-genomics/rotary-utils#10 . For now, tuning the end repair params in the config might also help.
keep_unrepaired_contigs: 'False' is now set to True by default.
Problem
So, my current genome fails end repair due to the contigs not being able to be circularized.
In the config, there is a
keep_unrepaired_contigs
parameter which allows the pipeline to proceed:However, this parameter is not mentioned in the error message, which might make the user think the pipeline fails permanently.
Proposed Solution
Mention the flag in the error message.