Closed Romain-B closed 1 year ago
Hi,
Sorry for the slow reply, but thank you for hunting down the issue so systematically! I agree, this is probably just a very unlucky case I have not encountered before because I typically test with ~10M reads at least. I don't see any issues with the suggested change and will push an update shortly!
Again, many thanks, Christoph
Hello & thanks for the great tool!
I have encountered a cryptic error when testing the pipeline on a 100k read subset of my Smart-seq3 data. SubRead works properly and finds both (ex) and (in) counts but then the zUMIs follow-up crashes, as shown below.
readcount_internal
is specific to Ss3 processing by zUMIs, and after digging into the functions of zUMIs I figured out that the issue lies in theumiCollapseID
function.By printing the nb of non-UMI reads and total number of reads just before detection of internal reads for Ss3, it appears that zUMIs chunked the data in 6 which resulted in 2 small chunks with an equal number of reads and non-UMI reads.
Because this line only validates strictly inferior non-UMI vs. nb of reads, the
internal_reads
object is never made for those chunks, which results in the error.By changing
to
The pipeline runs fine, which confirms this is the failing step.
I don't know how many reads one should use for testing, but given the ratio of UMI to internal reads can be variable in Ss3 data, I'm probably not in a unique situation.
I don't think there is any side effect to changing the evaluation above, but another fix could perhaps be to create the
readcount_internal
structure for all Ss3 chunks, regardless of the composition.What do you think?