tanaylab / metacells

Metacells - Single-cell RNA Sequencing Analysis
MIT License
86 stars 8 forks source link

divide_and_conquer_pipeline Error #69

Open simonekats opened 2 months ago

simonekats commented 2 months ago

Divid and conquer pipeline giving empty error, cells.X is float32

%%time with mc.ut.progress_bar(): mc.pl.divide_and_conquer_pipeline(cells, random_seed=123456)

Detect rare gene modules... 0%| [00:00]python(7641) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7642) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7643) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. 33%|██████████████████████▍ [01:39]python(7721) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7722) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7723) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7724) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7725) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7726) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7727) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7728) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7729) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7730) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7731) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7732) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7733) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7734) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7735) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(7736) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. 47%|███████████████████████████████▉ [01:47]

RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/Users/simone/anaconda3/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/Users/simone/anaconda3/lib/python3.10/site-packages/metacells/utilities/parallel.py", line 260, in _invocation result = PARALLEL_FUNCTION(index) File "/Users/simone/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py", line 221, in _collect_metacell assert np.min(fraction_per_gene_of_metacell) >= 0 AssertionError """

The above exception was the direct cause of the following exception:

AssertionError Traceback (most recent call last) File :2

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/divide_and_conquer.py:633, in divide_and_conquer_pipeline(adata, what, rare_max_genes, rare_max_gene_cell_fraction, rare_min_gene_maximum, rare_genes_similarity_method, rare_genes_cluster_method, rare_min_genes_of_modules, rare_min_cells_of_modules, rare_min_module_correlation, rare_min_related_gene_fold_factor, rare_max_related_gene_increase_factor, rare_min_cell_module_total, rare_max_cells_factor_of_random_pile, rare_deviants_max_cell_fraction, rare_dissolve_min_robust_size_factor, rare_dissolve_min_convincing_gene_fold_factor, quick_and_dirty, select_downsample_min_samples, select_downsample_min_cell_quantile, select_downsample_max_cell_quantile, select_min_gene_total, select_min_gene_top3, select_min_gene_relative_variance, select_min_genes, cells_similarity_value_regularization, cells_similarity_log_data, cells_similarity_method, groups_similarity_log_data, groups_similarity_method, target_metacell_umis, cell_umis, target_metacell_size, min_metacell_size, target_metacells_in_pile, min_target_pile_size, max_target_pile_size, piles_knn_k_size_factor, piles_min_split_size_factor, piles_min_robust_size_factor, piles_max_merge_size_factor, knn_k, knn_k_umis_quantile, min_knn_k, knn_balanced_ranks_factor, knn_incoming_degree_factor, knn_outgoing_degree_factor, knn_min_outgoing_degree, min_seed_size_quantile, max_seed_size_quantile, candidates_knn_k_size_factor, candidates_cooldown_pass, candidates_cooldown_node, candidates_cooldown_phase, candidates_min_split_size_factor, candidates_max_merge_size_factor, candidates_max_split_min_cut_strength, candidates_min_cut_seed_cells, must_complete_cover, deviants_policy, deviants_gap_skip_cells, deviants_min_gene_fold_factor, deviants_min_noisy_gene_fold_factor, deviants_max_gene_fraction, deviants_max_cell_fraction, deviants_max_gap_cells_count, deviants_max_gap_cells_fraction, dissolve_min_robust_size_factor, dissolve_min_convincing_gene_fold_factor, random_seed) 631 with ut.timed_step(".common"): 632 with ut.progress_bar_slice(common_cells_fraction): --> 633 _compute_divide_and_conquer_subset( 634 adata, 635 what, 636 prefix="common" if name is None else name + ".common", 637 metacells_level=0, 638 subset_mask=common_cells_mask, 639 collected_mask=collected_mask, 640 counts=counts, 641 dac_parameters=dac_parameters, 642 random_seed=random_seed, 643 ) 645 if rare_cells_count > 0: 646 selected_genes = ut.get_v_numpy(adata, "selected_gene")

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/divide_and_conquer.py:1016, in _compute_divide_and_conquer_subset(adata, what, prefix, subset_mask, collected_mask, metacells_level, counts, dac_parameters, random_seed) 1013 groups_time /= total_time 1015 with ut.progress_bar_slice(total_time): -> 1016 final_pile_of_cells = _compute_metacell_groups( 1017 adata, 1018 what, 1019 collect_time=collect_time, 1020 groups_time=groups_time, 1021 prefix=prefix + ".groups", 1022 subset_mask=subset_mask, 1023 dac_parameters=must_cover_dac_parameters, 1024 random_seed=random_seed, 1025 ) 1027 if dac_parameters.quick_and_dirty: 1028 ut.log_calc(f"# {prefix}.final")

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/divide_and_conquer.py:1176, in _compute_metacell_groups(adata, what, collect_time, groups_time, prefix, subset_mask, dac_parameters, random_seed) 1167 with ut.progress_bar_slice(collect_time): 1168 sdata = ut.slice( 1169 adata, 1170 name=f"{prefix}.grouped", (...) 1173 track_obs="full_cell_index", 1174 ) -> 1176 mdata = collect_metacells( 1177 sdata, 1178 what, 1179 groups=metacell_of_cells[subset_mask], 1180 name=prefix, 1181 _metacell_groups=True, 1182 top_level=False, 1183 random_seed=random_seed, 1184 ) 1186 with ut.progress_bar_slice(groups_time): 1187 metacell_sizes = ut.get_o_numpy(mdata, "grouped").astype("float32")

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py:244, in collect_metacells(adata, what, metacell_geo_mean, metacell_umis_regularization, zeros_cell_size_quantile, groups, name, prefix, top_level, _metacell_groups, random_seed) 231 ut.log_calc( 232 "fraction_per_gene_of_metacell", fraction_per_gene_of_metacell, formatter=ut.sizes_description 233 ) 235 return { 236 "grouped": grouped_of_metacell, 237 "total_umis": total_umis_of_metacell, (...) 241 "zeros_per_gene": zeros_per_gene, 242 } --> 244 results = ut.parallel_map(_collect_metacell, metacells_count) 246 fraction_per_gene_per_metacell = sp.csr_matrix(np.vstack([result["fraction_per_gene"] for result in results])) 247 assert str(fraction_per_gene_per_metacell.dtype) == "float32"

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/parallel.py:221, in parallel_map(function, invocations, max_processors, hide_from_progress_bar) 219 utm.timed_parameters(index=MAP_INDEX, processes=PROCESSES_COUNT) 220 with get_context("fork").Pool(PROCESSES_COUNT) as pool: --> 221 for index, result in pool.imap_unordered(_invocation, range(invocations)): 222 if utp.has_progress_bar() and not hide_from_progress_bar: 223 utp.did_progress(1 / invocations)

File ~/anaconda3/lib/python3.10/multiprocessing/pool.py:873, in IMapIterator.next(self, timeout) 871 if success: 872 return value --> 873 raise value

File ~/anaconda3/lib/python3.10/multiprocessing/pool.py:125, in worker() 123 job, i, func, args, kwds = task 124 try: --> 125 result = (True, func(*args, **kwds)) 126 except Exception as e: 127 if wrap_exception and func is not _helper_reraises_exception:

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/parallel.py:260, in _invocation() 257 os.environ["MKL_NUM_THREADS"] = str(PROCESSORS_COUNT) 259 assert PARALLEL_FUNCTION is not None --> 260 result = PARALLEL_FUNCTION(index) 261 return index, result

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py:221, in _collect_metacell() 219 fraction_per_gene_of_metacell[umis_per_gene_of_metacell == 0] = 0 220 fraction_per_gene_of_metacell[fraction_per_gene_of_metacell < 0] = 0 --> 221 assert np.min(fraction_per_gene_of_metacell) >= 0 222 fraction_per_gene_of_metacell /= np.sum(fraction_per_gene_of_metacell) 224 fraction_per_gene_of_metacell = fraction_per_gene_of_metacell.astype("float32")

AssertionError:

orenbenkiki commented 2 months ago

Yikes. This is the sort of asserts that are there as a last resort belt and suspenders defence. I mean, the code reads:

220> fraction_per_gene_of_metacell[fraction_per_gene_of_metacell < 0] = 0
221> assert np.min(fraction_per_gene_of_metacell) >= 0

In what universe can this fail???

At any rate:

  1. What's with these Python Malloc logging complaints? I'd be uncomfortable seeing this. I haven't seen these warnings before anywhere.

  2. Is this reproducible? Even if you disable parallel processing (maximal parallel processors of 1)?

  3. I'm assuming this is reproducible. Can you edit the file /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py and print the values in the array before line 220 and before line 221...?

simonekats commented 1 month ago

I edited the file and saved it but I do not see it printing Detect rare gene modules... 0%| [00:00]python(35744) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(35745) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(35746) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. 33%|██████████████████████▍ [01:40]python(36659) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36660) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36661) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36662) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36663) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36664) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36665) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36666) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36667) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36668) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36669) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36670) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36671) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36672) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36673) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(36674) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. 47%|███████████████████████████████▉ [01:48]

RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/Users/simone/anaconda3/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/Users/simone/anaconda3/lib/python3.10/site-packages/metacells/utilities/parallel.py", line 260, in _invocation result = PARALLEL_FUNCTION(index) File "/Users/simone/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py", line 221, in _collect_metacell print(fraction_per_gene_of_metacell) AssertionError """

The above exception was the direct cause of the following exception:

AssertionError Traceback (most recent call last) File :2

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/divide_and_conquer.py:633, in divide_and_conquer_pipeline(adata, what, rare_max_genes, rare_max_gene_cell_fraction, rare_min_gene_maximum, rare_genes_similarity_method, rare_genes_cluster_method, rare_min_genes_of_modules, rare_min_cells_of_modules, rare_min_module_correlation, rare_min_related_gene_fold_factor, rare_max_related_gene_increase_factor, rare_min_cell_module_total, rare_max_cells_factor_of_random_pile, rare_deviants_max_cell_fraction, rare_dissolve_min_robust_size_factor, rare_dissolve_min_convincing_gene_fold_factor, quick_and_dirty, select_downsample_min_samples, select_downsample_min_cell_quantile, select_downsample_max_cell_quantile, select_min_gene_total, select_min_gene_top3, select_min_gene_relative_variance, select_min_genes, cells_similarity_value_regularization, cells_similarity_log_data, cells_similarity_method, groups_similarity_log_data, groups_similarity_method, target_metacell_umis, cell_umis, target_metacell_size, min_metacell_size, target_metacells_in_pile, min_target_pile_size, max_target_pile_size, piles_knn_k_size_factor, piles_min_split_size_factor, piles_min_robust_size_factor, piles_max_merge_size_factor, knn_k, knn_k_umis_quantile, min_knn_k, knn_balanced_ranks_factor, knn_incoming_degree_factor, knn_outgoing_degree_factor, knn_min_outgoing_degree, min_seed_size_quantile, max_seed_size_quantile, candidates_knn_k_size_factor, candidates_cooldown_pass, candidates_cooldown_node, candidates_cooldown_phase, candidates_min_split_size_factor, candidates_max_merge_size_factor, candidates_max_split_min_cut_strength, candidates_min_cut_seed_cells, must_complete_cover, deviants_policy, deviants_gap_skip_cells, deviants_min_gene_fold_factor, deviants_min_noisy_gene_fold_factor, deviants_max_gene_fraction, deviants_max_cell_fraction, deviants_max_gap_cells_count, deviants_max_gap_cells_fraction, dissolve_min_robust_size_factor, dissolve_min_convincing_gene_fold_factor, random_seed) 631 with ut.timed_step(".common"): 632 with ut.progress_bar_slice(common_cells_fraction): --> 633 _compute_divide_and_conquer_subset( 634 adata, 635 what, 636 prefix="common" if name is None else name + ".common", 637 metacells_level=0, 638 subset_mask=common_cells_mask, 639 collected_mask=collected_mask, 640 counts=counts, 641 dac_parameters=dac_parameters, 642 random_seed=random_seed, 643 ) 645 if rare_cells_count > 0: 646 selected_genes = ut.get_v_numpy(adata, "selected_gene")

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/divide_and_conquer.py:1016, in _compute_divide_and_conquer_subset(adata, what, prefix, subset_mask, collected_mask, metacells_level, counts, dac_parameters, random_seed) 1013 groups_time /= total_time 1015 with ut.progress_bar_slice(total_time): -> 1016 final_pile_of_cells = _compute_metacell_groups( 1017 adata, 1018 what, 1019 collect_time=collect_time, 1020 groups_time=groups_time, 1021 prefix=prefix + ".groups", 1022 subset_mask=subset_mask, 1023 dac_parameters=must_cover_dac_parameters, 1024 random_seed=random_seed, 1025 ) 1027 if dac_parameters.quick_and_dirty: 1028 ut.log_calc(f"# {prefix}.final")

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/divide_and_conquer.py:1176, in _compute_metacell_groups(adata, what, collect_time, groups_time, prefix, subset_mask, dac_parameters, random_seed) 1167 with ut.progress_bar_slice(collect_time): 1168 sdata = ut.slice( 1169 adata, 1170 name=f"{prefix}.grouped", (...) 1173 track_obs="full_cell_index", 1174 ) -> 1176 mdata = collect_metacells( 1177 sdata, 1178 what, 1179 groups=metacell_of_cells[subset_mask], 1180 name=prefix, 1181 _metacell_groups=True, 1182 top_level=False, 1183 random_seed=random_seed, 1184 ) 1186 with ut.progress_bar_slice(groups_time): 1187 metacell_sizes = ut.get_o_numpy(mdata, "grouped").astype("float32")

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py:244, in collect_metacells(adata, what, metacell_geo_mean, metacell_umis_regularization, zeros_cell_size_quantile, groups, name, prefix, top_level, _metacell_groups, random_seed) 231 ut.log_calc( 232 "fraction_per_gene_of_metacell", fraction_per_gene_of_metacell, formatter=ut.sizes_description 233 ) 235 return { 236 "grouped": grouped_of_metacell, 237 "total_umis": total_umis_of_metacell, (...) 241 "zeros_per_gene": zeros_per_gene, 242 } --> 244 results = ut.parallel_map(_collect_metacell, metacells_count) 246 fraction_per_gene_per_metacell = sp.csr_matrix(np.vstack([result["fraction_per_gene"] for result in results])) 247 assert str(fraction_per_gene_per_metacell.dtype) == "float32"

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/parallel.py:221, in parallel_map(function, invocations, max_processors, hide_from_progress_bar) 219 utm.timed_parameters(index=MAP_INDEX, processes=PROCESSES_COUNT) 220 with get_context("fork").Pool(PROCESSES_COUNT) as pool: --> 221 for index, result in pool.imap_unordered(_invocation, range(invocations)): 222 if utp.has_progress_bar() and not hide_from_progress_bar: 223 utp.did_progress(1 / invocations)

File ~/anaconda3/lib/python3.10/multiprocessing/pool.py:873, in IMapIterator.next(self, timeout) 871 if success: 872 return value --> 873 raise value

File ~/anaconda3/lib/python3.10/multiprocessing/pool.py:125, in worker() 123 job, i, func, args, kwds = task 124 try: --> 125 result = (True, func(*args, **kwds)) 126 except Exception as e: 127 if wrap_exception and func is not _helper_reraises_exception:

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/parallel.py:260, in _invocation() 257 os.environ["MKL_NUM_THREADS"] = str(PROCESSORS_COUNT) 259 assert PARALLEL_FUNCTION is not None --> 260 result = PARALLEL_FUNCTION(index) 261 return index, result

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py:221, in _collect_metacell() 219 fraction_per_gene_of_metacell[umis_per_gene_of_metacell == 0] = 0 220 fraction_per_gene_of_metacell[fraction_per_gene_of_metacell < 0] = 0 --> 221 print(fraction_per_gene_of_metacell) 222 assert np.min(fraction_per_gene_of_metacell) >= 0 223 fraction_per_gene_of_metacell /= np.sum(fraction_per_gene_of_metacell)

orenbenkiki commented 1 month ago

The reported line number is still 221 so it seems the file wasn't modified (adding lines to print the data should have moved the assertion down to a higher line number).

simonekats commented 1 month ago

Detect rare gene modules... 0%| [00:00]python(38843) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38844) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38845) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. 33%|██████████████████████▍ [01:39]python(38931) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38932) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38933) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38934) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38935) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38936) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38937) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38938) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38939) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38940) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38941) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38942) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38943) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38944) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38945) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. python(38946) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. 47%|███████████████████████████████▉ [01:47]

RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/Users/simone/anaconda3/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/Users/simone/anaconda3/lib/python3.10/site-packages/metacells/utilities/parallel.py", line 260, in _invocation result = PARALLEL_FUNCTION(index) File "/Users/simone/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py", line 221, in _collect_metacell print(fraction_per_gene_of_metacell) AssertionError """

The above exception was the direct cause of the following exception:

AssertionError Traceback (most recent call last) File :2

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/divide_and_conquer.py:633, in divide_and_conquer_pipeline(adata, what, rare_max_genes, rare_max_gene_cell_fraction, rare_min_gene_maximum, rare_genes_similarity_method, rare_genes_cluster_method, rare_min_genes_of_modules, rare_min_cells_of_modules, rare_min_module_correlation, rare_min_related_gene_fold_factor, rare_max_related_gene_increase_factor, rare_min_cell_module_total, rare_max_cells_factor_of_random_pile, rare_deviants_max_cell_fraction, rare_dissolve_min_robust_size_factor, rare_dissolve_min_convincing_gene_fold_factor, quick_and_dirty, select_downsample_min_samples, select_downsample_min_cell_quantile, select_downsample_max_cell_quantile, select_min_gene_total, select_min_gene_top3, select_min_gene_relative_variance, select_min_genes, cells_similarity_value_regularization, cells_similarity_log_data, cells_similarity_method, groups_similarity_log_data, groups_similarity_method, target_metacell_umis, cell_umis, target_metacell_size, min_metacell_size, target_metacells_in_pile, min_target_pile_size, max_target_pile_size, piles_knn_k_size_factor, piles_min_split_size_factor, piles_min_robust_size_factor, piles_max_merge_size_factor, knn_k, knn_k_umis_quantile, min_knn_k, knn_balanced_ranks_factor, knn_incoming_degree_factor, knn_outgoing_degree_factor, knn_min_outgoing_degree, min_seed_size_quantile, max_seed_size_quantile, candidates_knn_k_size_factor, candidates_cooldown_pass, candidates_cooldown_node, candidates_cooldown_phase, candidates_min_split_size_factor, candidates_max_merge_size_factor, candidates_max_split_min_cut_strength, candidates_min_cut_seed_cells, must_complete_cover, deviants_policy, deviants_gap_skip_cells, deviants_min_gene_fold_factor, deviants_min_noisy_gene_fold_factor, deviants_max_gene_fraction, deviants_max_cell_fraction, deviants_max_gap_cells_count, deviants_max_gap_cells_fraction, dissolve_min_robust_size_factor, dissolve_min_convincing_gene_fold_factor, random_seed) 631 with ut.timed_step(".common"): 632 with ut.progress_bar_slice(common_cells_fraction): --> 633 _compute_divide_and_conquer_subset( 634 adata, 635 what, 636 prefix="common" if name is None else name + ".common", 637 metacells_level=0, 638 subset_mask=common_cells_mask, 639 collected_mask=collected_mask, 640 counts=counts, 641 dac_parameters=dac_parameters, 642 random_seed=random_seed, 643 ) 645 if rare_cells_count > 0: 646 selected_genes = ut.get_v_numpy(adata, "selected_gene")

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/divide_and_conquer.py:1016, in _compute_divide_and_conquer_subset(adata, what, prefix, subset_mask, collected_mask, metacells_level, counts, dac_parameters, random_seed) 1013 groups_time /= total_time 1015 with ut.progress_bar_slice(total_time): -> 1016 final_pile_of_cells = _compute_metacell_groups( 1017 adata, 1018 what, 1019 collect_time=collect_time, 1020 groups_time=groups_time, 1021 prefix=prefix + ".groups", 1022 subset_mask=subset_mask, 1023 dac_parameters=must_cover_dac_parameters, 1024 random_seed=random_seed, 1025 ) 1027 if dac_parameters.quick_and_dirty: 1028 ut.log_calc(f"# {prefix}.final")

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/divide_and_conquer.py:1176, in _compute_metacell_groups(adata, what, collect_time, groups_time, prefix, subset_mask, dac_parameters, random_seed) 1167 with ut.progress_bar_slice(collect_time): 1168 sdata = ut.slice( 1169 adata, 1170 name=f"{prefix}.grouped", (...) 1173 track_obs="full_cell_index", 1174 ) -> 1176 mdata = collect_metacells( 1177 sdata, 1178 what, 1179 groups=metacell_of_cells[subset_mask], 1180 name=prefix, 1181 _metacell_groups=True, 1182 top_level=False, 1183 random_seed=random_seed, 1184 ) 1186 with ut.progress_bar_slice(groups_time): 1187 metacell_sizes = ut.get_o_numpy(mdata, "grouped").astype("float32")

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py:244, in collect_metacells(adata, what, metacell_geo_mean, metacell_umis_regularization, zeros_cell_size_quantile, groups, name, prefix, top_level, _metacell_groups, random_seed) 231 ut.log_calc( 232 "fraction_per_gene_of_metacell", fraction_per_gene_of_metacell, formatter=ut.sizes_description 233 ) 235 return { 236 "grouped": grouped_of_metacell, 237 "total_umis": total_umis_of_metacell, (...) 241 "zeros_per_gene": zeros_per_gene, 242 } --> 244 results = ut.parallel_map(_collect_metacell, metacells_count) 246 fraction_per_gene_per_metacell = sp.csr_matrix(np.vstack([result["fraction_per_gene"] for result in results])) 247 assert str(fraction_per_gene_per_metacell.dtype) == "float32"

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/parallel.py:221, in parallel_map(function, invocations, max_processors, hide_from_progress_bar) 219 utm.timed_parameters(index=MAP_INDEX, processes=PROCESSES_COUNT) 220 with get_context("fork").Pool(PROCESSES_COUNT) as pool: --> 221 for index, result in pool.imap_unordered(_invocation, range(invocations)): 222 if utp.has_progress_bar() and not hide_from_progress_bar: 223 utp.did_progress(1 / invocations)

File ~/anaconda3/lib/python3.10/multiprocessing/pool.py:873, in IMapIterator.next(self, timeout) 871 if success: 872 return value --> 873 raise value

File ~/anaconda3/lib/python3.10/multiprocessing/pool.py:125, in worker() 123 job, i, func, args, kwds = task 124 try: --> 125 result = (True, func(*args, **kwds)) 126 except Exception as e: 127 if wrap_exception and func is not _helper_reraises_exception:

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/parallel.py:260, in _invocation() 257 os.environ["MKL_NUM_THREADS"] = str(PROCESSORS_COUNT) 259 assert PARALLEL_FUNCTION is not None --> 260 result = PARALLEL_FUNCTION(index) 261 return index, result

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py:221, in _collect_metacell() 219 fraction_per_gene_of_metacell[umis_per_gene_of_metacell == 0] = 0 220 fraction_per_gene_of_metacell[fraction_per_gene_of_metacell < 0] = 0 --> 221 print(fraction_per_gene_of_metacell) 222 assert np.min(fraction_per_gene_of_metacell) >= 0 223 fraction_per_gene_of_metacell /= np.sum(fraction_per_gene_of_metacell)

AssertionError:

orenbenkiki commented 1 month ago

Hmmm, this is even weirder, it seems merely accessing the fraction_per_gene_of_metacell array, just for printing, causes an assertion error? This makes no sense.

We are at voodoo territory now, so... I'd start with printing the array in several locations (after line 215, , after line 218, after line 220) to see at what point the array becomes "poisoned". I'd also print `fraction_per_gene_of_metacell.__class__ (in a separate statement) before printing its value. Perhaps the statistics package is returning something weird.

I'd also try to update the numpy/scipy/pandas packages using pip.

None of these are a "good answer" but desperate times call for desperate measures...

simonekats commented 1 month ago

Detect rare gene modules... 0%| [00:00]/Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: divide by zero encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: invalid value encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/dissolve.py:203: RuntimeWarning: divide by zero encountered in log2 np.log2(candidate_data_of_genes, out=candidate_data_of_genes) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/dissolve.py:203: RuntimeWarning: invalid value encountered in log2 np.log2(candidate_data_of_genes, out=candidate_data_of_genes) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: divide by zero encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: invalid value encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: divide by zero encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: invalid value encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: divide by zero encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: invalid value encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/dissolve.py:203: RuntimeWarning: divide by zero encountered in log2 np.log2(candidate_data_of_genes, out=candidate_data_of_genes) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/dissolve.py:203: RuntimeWarning: invalid value encountered in log2 np.log2(candidate_data_of_genes, out=candidate_data_of_genes) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: divide by zero encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: invalid value encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: divide by zero encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: invalid value encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: divide by zero encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: invalid value encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/dissolve.py:203: RuntimeWarning: invalid value encountered in log2 np.log2(candidate_data_of_genes, out=candidate_data_of_genes) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: divide by zero encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: invalid value encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: divide by zero encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: invalid value encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: divide by zero encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: invalid value encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: divide by zero encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/deviants.py:290: RuntimeWarning: invalid value encountered in log2 log_fraction_per_gene_per_cell = np.log2(fraction_per_gene_per_cell + regularization_per_cell[:, np.newaxis]) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/dissolve.py:203: RuntimeWarning: divide by zero encountered in log2 np.log2(candidate_data_of_genes, out=candidate_data_of_genes) /Users/simone/anaconda3/lib/python3.10/site-packages/metacells/tools/dissolve.py:203: RuntimeWarning: invalid value encountered in log2 np.log2(candidate_data_of_genes, out=candidate_data_of_genes) 30%|████████████████████▋ [03:27]/Users/simone/anaconda3/lib/python3.10/site-packages/metacells/utilities/computation.py:1590: RuntimeWarning: invalid value encountered in log2 np.log2(normalized_variance_per_element, out=normalized_variance_per_element) 38%|█████████████████████████▊ [03:28]/Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) [ nan nan 2.26555282e-05 ... 1.77747611e-05 1.77747611e-05 1.77747611e-05][ nan nan 2.48233902e-05 ... 1.70141471e-05 1.70141471e-05 1.70141471e-05][ nan nan 1.66652266e-05 ... 1.66652266e-05 1.82559597e-05 1.66652266e-05][ nan nan 2.12742132e-05 ... 1.80115934e-05 2.05262336e-05 1.80115934e-05][ nan nan 1.64945140e-05 ... 1.64945140e-05 1.78737687e-05 1.64945140e-05] /Users/simone/anaconda3/lib/python3.10/site-packages/scipy/stats/_stats_py.py:197: RuntimeWarning: invalid value encountered in log log_a = np.log(a) [ nan 3.28575224e-05 2.08899381e-05 ... 1.91721333e-05 1.91721333e-05 1.91721333e-05][ nan nan 1.91062567e-05 ... 1.76655837e-05 1.76655837e-05 1.76655837e-05][ nan nan 2.19008502e-05 ... 1.92753133e-05 2.16204308e-05 1.92753133e-05][ nan nan 2.12759733e-05 ... 1.86022260e-05 2.11023792e-05 1.86022260e-05][ nan nan 2.07037594e-05 ... 1.71507468e-05 1.83279607e-05 1.71507468e-05]

[ nan 5.82937630e-05 2.01749977e-05 ... 1.62178011e-05 1.62178011e-05 1.62178011e-05][ nan nan 2.20296777e-05 ... 1.91614994e-05 2.04884791e-05 1.91614994e-05] [ nan nan 2.79007206e-05 ... 2.22716197e-05 2.55231372e-05 2.22716197e-05][ nan nan 2.48233902e-05 ... 1.70141471e-05 1.70141471e-05 1.70141471e-05][ nan nan 2.29992269e-05 ... 1.71652370e-05 1.71652370e-05 1.71652370e-05][ nan nan 2.12742132e-05 ... 1.80115934e-05 2.05262336e-05 1.80115934e-05][ nan nan 2.26555282e-05 ... 1.77747611e-05 1.77747611e-05 1.77747611e-05][ nan nan 1.66652266e-05 ... 1.66652266e-05 1.82559597e-05 1.66652266e-05][ nan nan 4.4584167e-05 ... 4.4584167e-05 4.4584167e-05 4.4584167e-05][ nan nan 2.19008502e-05 ... 1.92753133e-05 2.16204308e-05 1.92753133e-05][ nan nan 1.64945140e-05 ... 1.64945140e-05 1.78737687e-05 1.64945140e-05][ nan 3.28575224e-05 2.08899381e-05 ... 1.91721333e-05 1.91721333e-05 1.91721333e-05][ nan nan 1.91062567e-05 ... 1.76655837e-05 1.76655837e-05 1.76655837e-05] [ nan nan 2.12759733e-05 ... 1.86022260e-05 2.11023792e-05 1.86022260e-05] [ nan nan 2.07037594e-05 ... 1.71507468e-05 1.83279607e-05 1.71507468e-05]

[ nan nan 2.20296777e-05 ... 1.91614994e-05 2.04884791e-05 1.91614994e-05][ nan nan 3.26261982e-06 ... 0.00000000e+00 2.51464022e-06 0.00000000e+00][ nan nan 7.80924309e-06 ... 0.00000000e+00 0.00000000e+00 0.00000000e+00][ nan nan 2.29992269e-05 ... 1.71652370e-05 1.71652370e-05 1.71652370e-05][ nan nan 2.79007206e-05 ... 2.22716197e-05 2.55231372e-05 2.22716197e-05][ nan nan 4.88076712e-06 ... 0.00000000e+00 0.00000000e+00 0.00000000e+00][ nan 5.82937630e-05 2.01749977e-05 ... 1.62178011e-05 1.62178011e-05 1.62178011e-05][ nan nan 4.4584167e-05 ... 4.4584167e-05 4.4584167e-05 4.4584167e-05][ nan nan 0.00000000e+00 ... 0.00000000e+00 1.37925468e-06 0.00000000e+00][ nan 1.36853892e-05 1.71780483e-06 ... 0.00000000e+00 0.00000000e+00 0.00000000e+00][ nan nan 0.00000000e+00 ... 0.00000000e+00 1.59073306e-06 0.00000000e+00][ nan nan 2.13022999e-05 ... 1.67369451e-05 1.74945171e-05 1.67369451e-05][ nan nan 2.62553687e-06 ... 0.00000000e+00 2.34511746e-06 0.00000000e+00] [ nan nan 1.44067301e-06 ... 0.00000000e+00 0.00000000e+00 0.00000000e+00][ nan nan 3.55301255e-06 ... 0.00000000e+00 1.17721390e-06 0.00000000e+00][ nan nan 2.67374733e-06 ... 0.00000000e+00 2.50015321e-06 0.00000000e+00]

[ nan nan 3.26261982e-06 ... 0.00000000e+00 2.51464022e-06 0.00000000e+00][ nan nan 2.86817837e-06 ... 0.00000000e+00 1.32697973e-06 0.00000000e+00][ nan nan 5.8339899e-06 ... 0.0000000e+00 0.0000000e+00 0.0000000e+00][ nan nan 7.80924309e-06 ... 0.00000000e+00 0.00000000e+00 0.00000000e+00][ nan nan 5.62910081e-06 ... 0.00000000e+00 3.25151748e-06 0.00000000e+00][ nan nan 4.88076712e-06 ... 0.00000000e+00 0.00000000e+00 0.00000000e+00][nan nan 0. ... 0. 0. 0.][ nan 4.20759619e-05 3.95719655e-06 ... 0.00000000e+00 0.00000000e+00 0.00000000e+00][ nan nan 0.00000000e+00 ... 0.00000000e+00 1.37925468e-06 0.00000000e+00][ nan 1.36853892e-05 1.71780483e-06 ... 0.00000000e+00 0.00000000e+00 0.00000000e+00][ nan nan 0.00000000e+00 ... 0.00000000e+00 1.59073306e-06 0.00000000e+00][ nan nan 2.13022999e-05 ... 1.67369451e-05 1.74945171e-05 1.67369451e-05][ nan nan 2.62553687e-06 ... 0.00000000e+00 2.34511746e-06 0.00000000e+00][ nan nan 3.55301255e-06 ... 0.00000000e+00 1.17721390e-06 0.00000000e+00][ nan nan 1.44067301e-06 ... 0.00000000e+00 0.00000000e+00 0.00000000e+00]

[ nan nan 2.67374733e-06 ... 0.00000000e+00 2.50015321e-06 0.00000000e+00]

[ nan nan 2.86817837e-06 ... 0.00000000e+00 1.32697973e-06 0.00000000e+00][ nan nan 5.8339899e-06 ... 0.0000000e+00 0.0000000e+00 0.0000000e+00][ nan nan 5.62910081e-06 ... 0.00000000e+00 3.25151748e-06 0.00000000e+00][ nan 4.20759619e-05 3.95719655e-06 ... 0.00000000e+00 0.00000000e+00 0.00000000e+00][nan nan 0. ... 0. 0. 0.][ nan nan 4.56535485e-06 ... 0.00000000e+00 7.57572059e-07 0.00000000e+00]

[ nan nan 4.56535485e-06 ... 0.00000000e+00 7.57572059e-07 0.00000000e+00] 46%|███████████████████████████████▏ [03:35]

RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/Users/simone/anaconda3/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/Users/simone/anaconda3/lib/python3.10/site-packages/metacells/utilities/parallel.py", line 260, in _invocation result = PARALLEL_FUNCTION(index) File "/Users/simone/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py", line 225, in _collect_metacell assert np.min(fraction_per_gene_of_metacell) >= 0 AssertionError """

The above exception was the direct cause of the following exception:

AssertionError Traceback (most recent call last) File :2

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/divide_and_conquer.py:633, in divide_and_conquer_pipeline(adata, what, rare_max_genes, rare_max_gene_cell_fraction, rare_min_gene_maximum, rare_genes_similarity_method, rare_genes_cluster_method, rare_min_genes_of_modules, rare_min_cells_of_modules, rare_min_module_correlation, rare_min_related_gene_fold_factor, rare_max_related_gene_increase_factor, rare_min_cell_module_total, rare_max_cells_factor_of_random_pile, rare_deviants_max_cell_fraction, rare_dissolve_min_robust_size_factor, rare_dissolve_min_convincing_gene_fold_factor, quick_and_dirty, select_downsample_min_samples, select_downsample_min_cell_quantile, select_downsample_max_cell_quantile, select_min_gene_total, select_min_gene_top3, select_min_gene_relative_variance, select_min_genes, cells_similarity_value_regularization, cells_similarity_log_data, cells_similarity_method, groups_similarity_log_data, groups_similarity_method, target_metacell_umis, cell_umis, target_metacell_size, min_metacell_size, target_metacells_in_pile, min_target_pile_size, max_target_pile_size, piles_knn_k_size_factor, piles_min_split_size_factor, piles_min_robust_size_factor, piles_max_merge_size_factor, knn_k, knn_k_umis_quantile, min_knn_k, knn_balanced_ranks_factor, knn_incoming_degree_factor, knn_outgoing_degree_factor, knn_min_outgoing_degree, min_seed_size_quantile, max_seed_size_quantile, candidates_knn_k_size_factor, candidates_cooldown_pass, candidates_cooldown_node, candidates_cooldown_phase, candidates_min_split_size_factor, candidates_max_merge_size_factor, candidates_max_split_min_cut_strength, candidates_min_cut_seed_cells, must_complete_cover, deviants_policy, deviants_gap_skip_cells, deviants_min_gene_fold_factor, deviants_min_noisy_gene_fold_factor, deviants_max_gene_fraction, deviants_max_cell_fraction, deviants_max_gap_cells_count, deviants_max_gap_cells_fraction, dissolve_min_robust_size_factor, dissolve_min_convincing_gene_fold_factor, random_seed) 631 with ut.timed_step(".common"): 632 with ut.progress_bar_slice(common_cells_fraction): --> 633 _compute_divide_and_conquer_subset( 634 adata, 635 what, 636 prefix="common" if name is None else name + ".common", 637 metacells_level=0, 638 subset_mask=common_cells_mask, 639 collected_mask=collected_mask, 640 counts=counts, 641 dac_parameters=dac_parameters, 642 random_seed=random_seed, 643 ) 645 if rare_cells_count > 0: 646 selected_genes = ut.get_v_numpy(adata, "selected_gene")

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/divide_and_conquer.py:1016, in _compute_divide_and_conquer_subset(adata, what, prefix, subset_mask, collected_mask, metacells_level, counts, dac_parameters, random_seed) 1013 groups_time /= total_time 1015 with ut.progress_bar_slice(total_time): -> 1016 final_pile_of_cells = _compute_metacell_groups( 1017 adata, 1018 what, 1019 collect_time=collect_time, 1020 groups_time=groups_time, 1021 prefix=prefix + ".groups", 1022 subset_mask=subset_mask, 1023 dac_parameters=must_cover_dac_parameters, 1024 random_seed=random_seed, 1025 ) 1027 if dac_parameters.quick_and_dirty: 1028 ut.log_calc(f"# {prefix}.final")

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/divide_and_conquer.py:1176, in _compute_metacell_groups(adata, what, collect_time, groups_time, prefix, subset_mask, dac_parameters, random_seed) 1167 with ut.progress_bar_slice(collect_time): 1168 sdata = ut.slice( 1169 adata, 1170 name=f"{prefix}.grouped", (...) 1173 track_obs="full_cell_index", 1174 ) -> 1176 mdata = collect_metacells( 1177 sdata, 1178 what, 1179 groups=metacell_of_cells[subset_mask], 1180 name=prefix, 1181 _metacell_groups=True, 1182 top_level=False, 1183 random_seed=random_seed, 1184 ) 1186 with ut.progress_bar_slice(groups_time): 1187 metacell_sizes = ut.get_o_numpy(mdata, "grouped").astype("float32")

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/logging.py:384, in logged..wrap..wrapper(*args, kwargs) 379 if log_value is not None: 380 logger().log( 381 param_level, "%swith %s: %s", INDENT_SPACES[: 2 INDENT_LEVEL], name, log_value 382 ) --> 384 return function(args, kwargs) 386 finally: 387 if logger().isEnabledFor(step_level):

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py:248, in collect_metacells(adata, what, metacell_geo_mean, metacell_umis_regularization, zeros_cell_size_quantile, groups, name, prefix, top_level, _metacell_groups, random_seed) 235 ut.log_calc( 236 "fraction_per_gene_of_metacell", fraction_per_gene_of_metacell, formatter=ut.sizes_description 237 ) 239 return { 240 "grouped": grouped_of_metacell, 241 "total_umis": total_umis_of_metacell, (...) 245 "zeros_per_gene": zeros_per_gene, 246 } --> 248 results = ut.parallel_map(_collect_metacell, metacells_count) 250 fraction_per_gene_per_metacell = sp.csr_matrix(np.vstack([result["fraction_per_gene"] for result in results])) 251 assert str(fraction_per_gene_per_metacell.dtype) == "float32"

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/parallel.py:221, in parallel_map(function, invocations, max_processors, hide_from_progress_bar) 219 utm.timed_parameters(index=MAP_INDEX, processes=PROCESSES_COUNT) 220 with get_context("fork").Pool(PROCESSES_COUNT) as pool: --> 221 for index, result in pool.imap_unordered(_invocation, range(invocations)): 222 if utp.has_progress_bar() and not hide_from_progress_bar: 223 utp.did_progress(1 / invocations)

File ~/anaconda3/lib/python3.10/multiprocessing/pool.py:873, in IMapIterator.next(self, timeout) 871 if success: 872 return value --> 873 raise value

File ~/anaconda3/lib/python3.10/multiprocessing/pool.py:125, in worker() 123 job, i, func, args, kwds = task 124 try: --> 125 result = (True, func(*args, **kwds)) 126 except Exception as e: 127 if wrap_exception and func is not _helper_reraises_exception:

File ~/anaconda3/lib/python3.10/site-packages/metacells/utilities/parallel.py:260, in _invocation() 257 os.environ["MKL_NUM_THREADS"] = str(PROCESSORS_COUNT) 259 assert PARALLEL_FUNCTION is not None --> 260 result = PARALLEL_FUNCTION(index) 261 return index, result

File ~/anaconda3/lib/python3.10/site-packages/metacells/pipeline/collect.py:225, in _collect_metacell() 223 fraction_per_gene_of_metacell[fraction_per_gene_of_metacell < 0] = 0 224 print(fraction_per_gene_of_metacell) --> 225 assert np.min(fraction_per_gene_of_metacell) >= 0 226 fraction_per_gene_of_metacell /= np.sum(fraction_per_gene_of_metacell) 228 fraction_per_gene_of_metacell = fraction_per_gene_of_metacell.astype("float32")

AssertionError:

orenbenkiki commented 1 month ago

Ah, nan values! Should have considered that. Do you perhaps have genes with no UMIs at all in your data set? Normally such genes should be excluded from the data before computing metacells.