encoding_dataframes memory

kaplans1 commented 1 month ago

I think there might be a memory leak when running "encoding_dataframes.encode_crispresso_allele_table" where the memory pressure gradually increases. I think this might be because of the high number of alleles in my samples. I've tried different numbers of cores, and the same effect is seen. Memory usage initially spikes to close to the maximum system-available memory, and then slowly grows during encoding until it hits a maximum that causes a machine crash. This typically does not happen with the first allele batch which is smaller, and going from allele batch to batch the memory refreshes. But with larger allele batches, the leak causes memory usage to exceed total available memory.

doczmp commented 1 month ago

Hi,

Thanks for letting us know. CRISPResso2 is not the tool to be used for PACBIO data. We also haven't tested CRISPR-Millipede for long-reads so I will have to take a look at the memory issue. We do have a tool to quantify edits in long read sequencing though its unpublished as of right now. Would be happy to collaborate/discuss this over email!

Best, Zain

kaplans1 commented 1 month ago

Hi Zain - got it!

I think we have generated a unique data set that may be of interest to you. I'd be happy to tell you about it, please feel free to reach out to me at kaplans1@mskcc.org

Best, Sam

pinellolab / CRISPR-millipede-target

encoding_dataframes memory #3