tanaylab / metacells

Metacells - Single-cell RNA Sequencing Analysis
MIT License
86 stars 8 forks source link

AssertionError related with parallelization in divide_and_conquer_pipeline #29

Closed Benfeitas closed 1 year ago

Benfeitas commented 1 year ago

Hi

I'm running divide_and_conquer_pipeline and I'm getting some parallelization issues that return AssertionError. What would be the best way to tackle this problem?

File ~/.conda/path/to/python3.9/site-packages/metacells/pipeline/divide_and_conquer.py:1810, in _run_parallel_piles(adata, what, phase, piles_count, pile_of_cells, feature_downsample_min_samples, feature_downsample_min_cell_quantile, feature_downsample_max_cell_quantile, feature_min_gene_total, feature_min_gene_top3, feature_min_gene_relative_variance, feature_gene_names, feature_gene_patterns, forbidden_gene_names, forbidden_gene_patterns, feature_correction, cells_similarity_value_normalization, cells_similarity_log_data, cells_similarity_method, target_metacell_size, max_cell_size, max_cell_size_factor, cell_sizes, knn_k, min_knn_k, knn_balanced_ranks_factor, knn_incoming_degree_factor, knn_outgoing_degree_factor, min_seed_size_quantile, max_seed_size_quantile, candidates_cooldown_pass, candidates_cooldown_node, candidates_cooldown_phase, candidates_min_split_size_factor, candidates_max_merge_size_factor, candidates_min_metacell_cells, candidates_max_split_min_cut_strength, candidates_min_cut_seed_cells, must_complete_cover, deviants_min_gene_fold_factor, deviants_abs_folds, deviants_max_gene_fraction, deviants_max_cell_fraction, dissolve_min_robust_size_factor, dissolve_min_convincing_size_factor, dissolve_min_convincing_gene_fold_factor, dissolve_min_metacell_cells, random_seed, hide_from_progress_bar)
   1808 gc.collect()
   1809 ut.logger().debug("MAX_PARALLEL_PILES: %s", get_max_parallel_piles())
-> 1810 return ut.parallel_map(
   1811     compute_pile_metacells,
   1812     piles_count,
   1813     max_processors=get_max_parallel_piles(),
   1814     hide_from_progress_bar=hide_from_progress_bar,
   1815 )

File ~/.conda/path/to/python3.9/site-packages/metacells/utilities/parallel.py:211, in parallel_map(function, invocations, max_processors, hide_from_progress_bar)
    209 utm.timed_parameters(index=MAP_INDEX, processes=PROCESSES_COUNT)
    210 with get_context("fork").Pool(PROCESSES_COUNT) as pool:
--> 211     for index, result in pool.imap_unordered(_invocation, range(invocations)):
    212         if utp.has_progress_bar() and not hide_from_progress_bar:
    213             utp.did_progress(1 / invocations)

File ~/.conda/path/to/python3.9/multiprocessing/pool.py:870, in IMapIterator.next(self, timeout)
    868 if success:
    869     return value
--> 870 raise value

AssertionError: 
orenbenkiki commented 1 year ago

This is snipped just when it got to the good part - what was the actual assertion error?

Benfeitas commented 1 year ago

This is snipped just when it got to the good part - what was the actual assertion error?

Unfortunately, it was blank after the assertion error: Screenshot 2022-09-23 122932

orenbenkiki commented 1 year ago

Looking at the source code of pool.py, this is pretty opaque. Most likely explanation is that one of the sub-processes died and somehow the code wasn't kind enough to pass the error onwards.

I can only speculate, but, one possible scenario is that you run out of memory. Linux is notorious for just killing processes out of hand when that happens with little or no indication of the reason.

You can verify this by running top or better yet htop in parallel to the code and observing that the system is indeed running out of memory. If this is indeed the reason, you can reduce the amount of used memory by setting METACELLS_MAX_PARALLEL_PILES or invoking mc.pl.set_max_parallel_piles. The system does try and guess a good value for this using guess_max_parallel_piles but that is just a heuristic and is far from perfect.

If this isn't the issue, perhaps running everything with debug logging (mc.ut.setup_logger(level=logging.DEBUG) immediately after import metacells as mc) will generate a very long detailed log file which may help pinpoint whatever the issue is.

Benfeitas commented 1 year ago

Many thanks @orenbenkiki .

I had tried all that based on other discussions I've seen in other issues here but to no avail (the guess_max_parallel was giving me over 65, and I had tried 65, 60, and 30). I had not seen memory overload but cannot guarantee it didn't happen. After reading your comment i went substantially lower to a set_max_parallel_piles of 6, and the command finished so it seems to have worked, pending verification of the results.

I will thus close this thread, will reopen if anything fishy is there in the results. Many thanks