Open mirax87 opened 5 years ago
Hi @mirax87 !
Are you using --segmentation
input? If you do, this i the main reason that iCount xlsites
is taking so long. Please run it without segmentation (AFAIK, this is the way most users do it). We should speed up the algorithm in case segmentation is given, but never found the time to do it properly
Regarding other factors that could affect runtime:
group_by
should have zero effect on runtimemapq_th
will take into account less (poorly mapped) reads, so this should speed things up a bit. But if the quality of mapping is suffcient this should not be very significantmax_barcodes
parameter can speed up things significantly, but this should be used only in such cases.Hi @JureZmrzlikar,
you are right, I am using iCount xlsites --segmentation
. I'll try without.
Thanks for the quick feedback. Cheers
Hi,
in order to process our D. melanogaster iCLIP library, I used snakemake to put the iCount steps together and integrated benchmarking, specifically for
iCount xlsites
with quantification based on cDNA and reads.Here, I am observing runtimes of ~1 - 4 days on our cluster system for
iCount xlsites
. The number of reads per multiplexing barcode is quite variable, which correlates with runtime.In terms of parameters, I use
using the output gtf from
iCount segment
I wonder what - next to total number of mapped reads - determines the runtime of
iCount xlsites
and whether there are some useful pre-filtering strategies of the BAM files to speed up the process without losing (too much) sensitivity.Cheers