The recently added patch that creates SAM compatible text header output if no
'text' and no pre-build header is passed has an unfortunate quadratic runtime
behavior due to string concatenation. This makes it pretty much unusable in
situation with a lot of references (Salmo salamar currently has 550k contigs. I
canceled after 20 minutes, the patch runs in less than 10 seconds).
Here's a patch that replaces it with the old 'build a list and join that'
approach.
--- a/pysam/csamtools.pyx Tue Jan 17 15:19:03 2012 +0100
+++ b/pysam/csamtools.pyx Tue Mar 20 12:40:34 2012 +0100
@@ -567,9 +567,10 @@
# Optionally, if there is no text, add a SAM compatible header to output
# file.
if text is None and add_sq_text:
- text = ''
+ text = []
for x from 0 <= x < header_to_write.n_targets:
- text += "@SQ\tSN:%s\tLN:%s\n" % (referencenames[x],
referencelengths[x] )
+ text.append("@SQ\tSN:%s\tLN:%s\n" %
(referencenames[x], referencelengths[x] ))
+ text = ''.join(text)
if text != None:
# copy without \0
Original issue reported on code.google.com by finkerna...@mathematik.uni-marburg.de on 20 Mar 2012 at 11:41
Original issue reported on code.google.com by
finkerna...@mathematik.uni-marburg.de
on 20 Mar 2012 at 11:41