Open rhpvorderman opened 1 year ago
Interesting idea. I’ll have to see whether this can be applied. I did some experiments converting all of pipeline.py
to Cython a while ago, but that didn’t really help, so I don’t know how much of an improvement this can be.
However, what I’ve often seen when profiling is that creating a ModificationInfo
instance is relatively slow, so I took this opportunity to move this over to Cython. It’s not a huge improvement, but it helps a bit. See #655.
Hi,
I did a performance profiling of cutadapt:
So what stands out to me is the time needed for the process_reads function.
So what happens here is that:
This python code is expensive. An alternative option is to:
This will remove quite a lot of python overhead from the pipeline. I applied this on my fastq-filter program. Admittedly, that one is a bit over optimized (with every filter being written in C), but it works quite well. (see https://github.com/LUMC/fastq-filter/blob/ac6173fadd0d802deecc60cbcf848d810d1f025d/src/fastq_filter/__init__.py#L128)