navinlabcode / copykat

Other
193 stars 54 forks source link

Memory usage for big scale data #45

Closed heshidian closed 2 years ago

heshidian commented 2 years ago

Hi, thank you so much for developing this helpful tool! Recently i used this tool to infer CNV for nearly 100,000 cells, but I've got a problem I'd like to consult. The running process has been stopped at the fourth step for more than 5 days, and the computer memory usage has exceeded 90G. I don't know if the running time is too long due to the large data set, or there is something else going on.

gaobio commented 2 years ago

Hi, thank you so much for developing this helpful tool! Recently i used this tool to infer CNV for nearly 100,000 cells, but I've got a problem I'd like to consult. The running process has been stopped at the fourth step for more than 5 days, and the computer memory usage has exceeded 90G. I don't know if the running time is too long due to the large data set, or there is something else going on.

Hmmm, it's not a good idea to run 100,000 cells all at once. I bet you combined many samples. Copykat uses within sample variations. Combined samples won't work, which picks up inter-sample batch effects.