scverse / scanpy_usage

Scanpy use cases.
BSD 3-Clause "New" or "Revised" License
75 stars 59 forks source link

memory #13

Open ShobiStassen opened 5 years ago

ShobiStassen commented 5 years ago

Hi, I am trying to run the full 1.3M 10X mouse cell dataset (using the 1M_neurons_filtered_gene_bc_matrices_h5.h5 file from 10X website). I have 126GB RAM and Intel® Xeon(R) W-2123 CPU @ 3.60GHz × 8 which is above the requirements you mention needed to run the full cluster.py method without subsampling. I get memory error at the filter_genes_dispersion stage, should i modify the code in anyway? (without subsampling) Thanks,Shobi

adata = sc.read_10x_h5(filename) adata.var_names_make_unique() sc.pp.recipe_zheng17(adata)

running recipe zheng17 filtered out 3983 genes that are detected in less than 1 counts Traceback (most recent call last): File "/home/shobi/PycharmProjects/my_first_conda_project/10X_mousebrain.py", line 61, in main() File "/home/shobi/PycharmProjects/my_first_conda_project/10X_mousebrain.py", line 58, in main basic_analysis(DIR+'1M_neurons_filtered_gene_bc_matrices_h5.h5') File "/home/shobi/PycharmProjects/my_first_conda_project/10X_mousebrain.py", line 24, in basic_analysis sc.pp.recipe_zheng17(adata) File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scanpy/preprocessing/_recipes.py", line 108, in recipe_zheng17 adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False) File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py", line 109, in filter_genes_dispersion mean, var = materialize_as_ndarray(_get_mean_var(X)) File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scanpy/preprocessing/_utils.py", line 10, in _get_mean_var mean = X.mean(axis=0) File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/base.py", line 1077, in mean inter_self = self.astype(inter_dtype) File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/data.py", line 74, in astype return self.copy() File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/data.py", line 91, in copy return self._with_data(self.data.copy(), copy=True) File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 1124, in _with_data return self.class((data,self.indices.copy(),self.indptr.copy()), MemoryError

ShobiStassen commented 5 years ago

I also wanted to add that the initial filtering and normalization steps in recipe_Zhang17() already used around 70GB RAM - is this expected (the readme says around 30GB should be sufficient)? adata = sc.read_10x_h5(filename) sc.pp.filter_genes(adata, min_counts=1) sc.pp.normalize_per_cell( adata, key_n_counts='n_counts_all')

davisidarta commented 4 years ago

I also have replicated these findings on a 128GB RAM six-core Xeon P52 workstation and on a HPCC. Baseline memory usage is around 30GB, peaks at ~140GB during PCA and scaling, and takes around ~60GB for other computations. These results were identical regardless of being computed in the workstation or the HPCC.