Open talipzengin opened 3 months ago
Hi Talip, after talking to other core devs it seems like the size of the final table is too large for the Colab environment. You can try reducing the table size e.g. by taking one or a few chromosomes. (for the purposes of the tutorials that is more than enough)
Also another option is to use the argument return_input=False, so you'd only get the indices of overlapping intervals. That should have a much lower memory footprint.
Hi. I have a gene table (genomic locations of genes, 63086 rows × 4 columns) and a CNV table including genomic locations of Copy Number Variables (799505 rows × 5 columns; after segment mean filtering: 507673 rows × 6 columns). I want to determine the genes that have variable copy number. Bioframe.overlap gave RAM error ("Your session crashed after using all available RAM.") for large table dimensions in Colab notebook (Colab RAM is 12.7 GB).
I have tried different dimensions and the results: CNV table: 799505 rows × 5 columns Gene table (gene_coord): 63086 rows × 4 columns => RAM error
CNV table: 507673 rows × 6 columns Gene table (gene_coord): 63086 rows × 4 columns => 23018739 rows × 10 columns
CNV table: 507673 rows × 6 columns Gene table (gene_coord_expanded): 63086 rows × 6 columns => RAM error (It should have returned 23018739 rows × 12 columns.)
My code: