Closed mmoisse closed 4 years ago
I noticed a 10x performance drop compared to my older transvar version (https://bitbucket.org/wanding/transvar/commits/8a7a774618174bd591e8821b9c7c7fd5c03ce8c4) for some variants. I traced back the performance drop to the addition of decode() the fetch_sequence function, which convert seq from str to unicode and apparently the concatenation of unicode is way slower than that of str https://github.com/zwdzwd/transvar/blob/28a725dfb30acbd5c5cde7a7c8015ffdcbb1826b/transvar/faidx.py#L81-L83
decode()
fetch_sequence
str
unicode
I suggest to only concatenate the unicode at the end of the loop or remove the decode()
test.vcf.gz
transvar ganno --vcf test.vcf.gz --refversion hg19 --ccds
Current version: 46.5366 s Version without decode(): 5.30291 s Version without one join(): 9.85117 s
I confirm that this patch significantly improves performance of transvar. Thanks!
Thanks for the suggestion and confirmation. Sorry for having missed this. Will merge and integrate soon.
I noticed a 10x performance drop compared to my older transvar version (https://bitbucket.org/wanding/transvar/commits/8a7a774618174bd591e8821b9c7c7fd5c03ce8c4) for some variants. I traced back the performance drop to the addition of
decode()
thefetch_sequence
function, which convert seq fromstr
tounicode
and apparently the concatenation ofunicode
is way slower than that ofstr
https://github.com/zwdzwd/transvar/blob/28a725dfb30acbd5c5cde7a7c8015ffdcbb1826b/transvar/faidx.py#L81-L83I suggest to only concatenate the
unicode
at the end of the loop or remove thedecode()
test.vcf.gz
Current version: 46.5366 s Version without decode(): 5.30291 s Version without one join(): 9.85117 s