jian-shu-lab / Bering

Other
14 stars 1 forks source link

CPU RAM usage #14

Open canergen opened 1 year ago

canergen commented 1 year ago

Hi, I'm trying to run

br.tl.node_classification(
    bg, bg.spots_all.copy(), 
    n_neighbors = 30, 
    num_chunks = 500,
)

On a Xenium data release from 10X (brain) with 40658081, I would call it medium-sized. The code is kernel crashing requesting >300 GB RAM. What settings would you recommend? How much RAM is necessary for those data sets (before deciding which node is adequate)?

KANG-BIOINFO commented 1 year ago

Hi canergen,

This sounds like a large-sized dataset to me. I wondering how many transcripts have you seen for large-sized data so that I can have a better understanding of the memory requirement for future improvement.

One strategy I usually use is to set the num_chunks to a larger value. In your case, can you for example set it to 10,000? This is a similar strategy to segmentation tile by tile. Please let me know what it looks like and share your feedback. I am also planning to test the segmentation of the public Xenium breast cancer to optimize memory usage.