welch-lab / PerturbNet

PerturbNet is a deep generative model that can predict the distribution of cell states induced by chemical or genetic perturbation
GNU General Public License v3.0
29 stars 10 forks source link

RAM usage exceeds capacity when loading sciplex chemical dataset. #4

Open RapidsAIpk opened 6 months ago

RapidsAIpk commented 6 months ago

When attempting to load the SCIPLEX chemical dataset using the perturbnet_sciplex_example_notebook.ipynb file, my system's RAM becomes fully utilized and the process is killed. My system has 64 GB of RAM, but it appears that loading this dataset exceeds its capacity.

This issue occurs during the execution of the notebook, specifically when loading the SCIPLEX chemical dataset. Despite having sufficient RAM, the process is unable to complete due to excessive memory consumption. The error comes in the following line of code:

(2) load models

generation scvi

adata_train = adata[idx_to_train, :].copy() adata_train = adata_train[kept_indices, :].copy()

scvi.data.setup_anndata(adata_train, layer = "counts") scvi_model_cinn = scvi.model.SCVI.load(path_scvi_model_cinn, adata_train, use_cuda = False) scvi_model_de = scvi_predictive_z(scvi_model_cinn)

device = 'cuda' if torch.cuda.is_available() else 'cpu'

ChemicalVAE

model_chemvae = ChemicalVAE(n_char = data_chem_onehot.shape[2], max_len = data_chem_onehot.shape[1]).to(device) model_chemvae.load_state_dict(torch.load(path_chemvae_model, map_location = device)) model_chemvae.eval()

I would like to request assistance in understanding the system requirements for running PerturbNet and resolving this issue to successfully load the SCIPLEX chemical dataset without exhausting the available RAM.

cyclopenta commented 1 week ago

In my test, 64GB of RAM should be enough to run the entire process. You can consider trying the newly uploaded dataset. In the old version of the code, there might be some redundant variables. Consider deleting them after you have initialized the model.