tanaylab / metacells

Metacells - Single-cell RNA Sequencing Analysis
MIT License
86 stars 8 forks source link

mc.pl.extract_clean_data(full, name="hca_bm.one-pass.clean") crashes jupyter notebook everytime #66

Open lindsdudley opened 5 months ago

lindsdudley commented 5 months ago

Hi Oren,

Thank you so much for creating this program! I am running into a little trouble running the one pass vignette in my conda environment. My jupyter notebook crashes everything I run the mc.pl.extract_clean_data command. I believe this is similar to closed issue 5 but none of the workarounds in it ended up helping me. I have tried installing dependencies using conda like was suggested and used pip to install metacell regularly and also with the native flag but it doesn't seem to help at all. I would really appreciate you help debugging. I am running metacell 0.9.4, conda 24.1.2 and my Operating system is the latest release of POP OS which is a Linux system. I also greped to make sure I had avxs and I did. Thank you so much for your help and please let me know if you need anymore information to try to solve this.

orenbenkiki commented 5 months ago

Can you follow the stepshere https://github.com/tanaylab/metacells/issues/5#issuecomment-903711959 - that is, verify which instruction is at fault?

That said, if you have compiled using --native, I don't see why the compiler have generated anything that isn't supported...

lindsdudley commented 5 months ago

grep flags /proc/cpuinfo | head -1 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d

lindsdudley commented 5 months ago

I'm a bit confused about the gdb command you use got confused by the output. Can you tell me what you are trying to look for?

lindsdudley commented 5 months ago

(metacells) lindseydudley@pop-os:~$ gdb 'which python3' GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1 Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: https://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.

For help, type "help". Type "apropos word" to search for commands related to "word"... which python3: No such file or directory. (gdb) r te Starting program: te No executable file specified. Use the "file" or "exec-file" command. (gdb) r test_seg.py Starting program: test_seg.py No executable file specified. Use the "file" or "exec-file" command.

orenbenkiki commented 5 months ago

The thing is, you need to tell gdb which (binary) program to debug, which in our case is the Python interpreter.

Normally this is called python3. To find the path to it we call which as in which python3. However it seems in your system it doesn't exist (hence the error message which python3: No such file or directory).

Sometimes it is just called python, so you would need to say which python. That said, make sure you have a reasonably up-to-date Python version by running python --version. It should say something like Python 3.12.2 (or whatever your version is).

Chalk all this up to the royal mess that is the migration from Python 2. to 3., which still messes things up even after all these years.

What we are trying to see if what is the actual cause of the crash - which binary instruction is not supported - and then we'll have to back-track to see how come this instruction was generated into the binary extensions compiled by the metacells package. On its face, compiling with --native should compile for the current machine so this "shouldn't happen". Yet, here we are...

lindsdudley commented 5 months ago

Hi Oren,

Sorry for the confusion as I have never used gdb before. This is what I have as the output when I try to specifically debug the jupyter notebook that keeps crashing.

(gdb) r /home/lindseydudley/Desktop/Metacell/metacells-vignettes/notebooks/one-pass.ipynb Starting program: /home/lindseydudley/anaconda3/envs/mcell/bin/python3 /home/lindseydudley/Desktop/Metacell/metacells-vignettes/notebooks/one-pass.ipynb [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Traceback (most recent call last): File "/home/lindseydudley/Desktop/Metacell/metacells-vignettes/notebooks/one-pass.ipynb", line 7954, in "execution_count": null, ^^^^ NameError: name 'null' is not defined [Inferior 1 (process 765196) exited with code 01] (gdb)

lindsdudley commented 5 months ago

Hi Oren,

After playing around with gdb and creating a test script based off of the jupyter notebook. This is the output I get. Please let me know if this is helpful or if there is anything else I can do to help find the error. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /home/lindseydudley/anaconda3/envs/mcell/bin/python3... (gdb) r test_op.py Starting program: /home/lindseydudley/anaconda3/envs/mcell/bin/python3 test_op.py [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7ffff306c640 (LWP 770028)] [New Thread 0x7ffff286b640 (LWP 770029)] [New Thread 0x7fffea06a640 (LWP 770030)] [New Thread 0x7fffe1869640 (LWP 770031)] [New Thread 0x7fffd9068640 (LWP 770032)] [New Thread 0x7fffd0867640 (LWP 770033)] [New Thread 0x7fffc8066640 (LWP 770034)] [New Thread 0x7fffbf865640 (LWP 770035)] [New Thread 0x7fffb7064640 (LWP 770036)] [New Thread 0x7fffae863640 (LWP 770037)] [New Thread 0x7fffa6062640 (LWP 770038)] [New Thread 0x7fff9d861640 (LWP 770039)] [New Thread 0x7fff8d060640 (LWP 770040)] [New Thread 0x7fff8c85f640 (LWP 770041)] [New Thread 0x7fff8405e640 (LWP 770042)] [New Thread 0x7fff7385d640 (LWP 770043)] [New Thread 0x7fff7305c640 (LWP 770044)] [New Thread 0x7fff6a85b640 (LWP 770045)] [New Thread 0x7fff6205a640 (LWP 770046)] [New Thread 0x7fff51859640 (LWP 770047)] [New Thread 0x7fff51058640 (LWP 770048)] [New Thread 0x7fff48857640 (LWP 770049)] [New Thread 0x7fff40056640 (LWP 770050)] [New Thread 0x7fff37855640 (LWP 770051)] [New Thread 0x7fff2f054640 (LWP 770052)] [New Thread 0x7fff1e853640 (LWP 770053)] [New Thread 0x7fff1e052640 (LWP 770054)] [New Thread 0x7fff0d851640 (LWP 770055)] [New Thread 0x7fff05050640 (LWP 770056)] [New Thread 0x7fff0484f640 (LWP 770057)] [New Thread 0x7ffefc04e640 (LWP 770058)] [Detaching after vfork from child process 770062] [Detaching after vfork from child process 770065] Full: 378000 cells, 33694 genes Will exclude 66232 (17.52%%) cells with less than 800 UMIs Will exclude 8672 (2.29%%) cells with more than 20000 UMIs [New Thread 0x7ff2f657b640 (LWP 770400)] [New Thread 0x7ff2f5d7a640 (LWP 770401)] [New Thread 0x7ff243fff640 (LWP 770402)] [New Thread 0x7ff24afd5640 (LWP 770403)] [New Thread 0x7ff24a7d4640 (LWP 770404)] [New Thread 0x7ff249fd3640 (LWP 770405)] [New Thread 0x7ff2497d2640 (LWP 770406)] [New Thread 0x7ff248fd1640 (LWP 770407)] [New Thread 0x7ff2437fe640 (LWP 770408)] [New Thread 0x7ff242ffd640 (LWP 770409)] [New Thread 0x7ff2427fc640 (LWP 770410)] [New Thread 0x7ff241ffb640 (LWP 770411)] [New Thread 0x7ff2417fa640 (LWP 770412)] [New Thread 0x7ff240ff9640 (LWP 770413)] [New Thread 0x7ff213fff640 (LWP 770414)] [New Thread 0x7ff2137fe640 (LWP 770415)] [Thread 0x7ff2137fe640 (LWP 770415) exited] [Thread 0x7ff213fff640 (LWP 770414) exited] [Thread 0x7ff240ff9640 (LWP 770413) exited] [Thread 0x7ff2417fa640 (LWP 770412) exited] [Thread 0x7ff241ffb640 (LWP 770411) exited] [Thread 0x7ff2427fc640 (LWP 770410) exited] [Thread 0x7ff242ffd640 (LWP 770409) exited] [Thread 0x7ff2437fe640 (LWP 770408) exited] [Thread 0x7ff248fd1640 (LWP 770407) exited] [Thread 0x7ff2497d2640 (LWP 770406) exited] [Thread 0x7ff249fd3640 (LWP 770405) exited] [Thread 0x7ff24a7d4640 (LWP 770404) exited] [Thread 0x7ff24afd5640 (LWP 770403) exited] [Thread 0x7ff243fff640 (LWP 770402) exited] [Thread 0x7ff2f5d7a640 (LWP 770401) exited] [Thread 0x7ff2f657b640 (LWP 770400) exited] [New Thread 0x7ff2137fe640 (LWP 770417)] [New Thread 0x7ff213fff640 (LWP 770418)] [New Thread 0x7ff240ff9640 (LWP 770419)] [New Thread 0x7ff2417fa640 (LWP 770420)] [New Thread 0x7ff2f657b640 (LWP 770421)] [New Thread 0x7ff2f5d7a640 (LWP 770422)] [New Thread 0x7ff24afd5640 (LWP 770423)] [New Thread 0x7ff24a7d4640 (LWP 770424)] [New Thread 0x7ff249fd3640 (LWP 770425)] [New Thread 0x7ff2497d2640 (LWP 770426)] [New Thread 0x7ff248fd1640 (LWP 770427)] [New Thread 0x7ff243fff640 (LWP 770428)] [New Thread 0x7ff2437fe640 (LWP 770429)] [New Thread 0x7ff242ffd640 (LWP 770430)] [Thread 0x7ff242ffd640 (LWP 770430) exited] [Thread 0x7ff2437fe640 (LWP 770429) exited] [Thread 0x7ff243fff640 (LWP 770428) exited] [Thread 0x7ff248fd1640 (LWP 770427) exited] [Thread 0x7ff2497d2640 (LWP 770426) exited] [Thread 0x7ff249fd3640 (LWP 770425) exited] [Thread 0x7ff24a7d4640 (LWP 770424) exited] [Thread 0x7ff24afd5640 (LWP 770423) exited] [Thread 0x7ff2f5d7a640 (LWP 770422) exited] [Thread 0x7ff2f657b640 (LWP 770421) exited] [Thread 0x7ff2417fa640 (LWP 770420) exited] [Thread 0x7ff240ff9640 (LWP 770419) exited] [Thread 0x7ff213fff640 (LWP 770418) exited] [Thread 0x7ff2137fe640 (LWP 770417) exited] [New Thread 0x7ff242ffd640 (LWP 770431)] [New Thread 0x7ff2437fe640 (LWP 770432)] [New Thread 0x7ff243fff640 (LWP 770433)] [New Thread 0x7ff248fd1640 (LWP 770434)] [New Thread 0x7ff2f657b640 (LWP 770435)] [New Thread 0x7ff2f5d7a640 (LWP 770436)] [New Thread 0x7ff24afd5640 (LWP 770437)] [New Thread 0x7ff24a7d4640 (LWP 770438)] [New Thread 0x7ff249fd3640 (LWP 770439)] [New Thread 0x7ff2497d2640 (LWP 770440)] [New Thread 0x7ff2427fc640 (LWP 770441)] [New Thread 0x7ff241ffb640 (LWP 770442)] [New Thread 0x7ff2417fa640 (LWP 770443)] [New Thread 0x7ff240ff9640 (LWP 770444)] [New Thread 0x7ff213fff640 (LWP 770445)] [New Thread 0x7ff2137fe640 (LWP 770446)] [Thread 0x7ff2137fe640 (LWP 770446) exited] [Thread 0x7ff213fff640 (LWP 770445) exited] [Thread 0x7ff240ff9640 (LWP 770444) exited] [Thread 0x7ff2417fa640 (LWP 770443) exited] [Thread 0x7ff241ffb640 (LWP 770442) exited] [Thread 0x7ff2427fc640 (LWP 770441) exited] [Thread 0x7ff2497d2640 (LWP 770440) exited] [Thread 0x7ff249fd3640 (LWP 770439) exited] [Thread 0x7ff24a7d4640 (LWP 770438) exited] [Thread 0x7ff24afd5640 (LWP 770437) exited] [Thread 0x7ff2f5d7a640 (LWP 770436) exited] [Thread 0x7ff2f657b640 (LWP 770435) exited] [Thread 0x7ff248fd1640 (LWP 770434) exited] [Thread 0x7ff243fff640 (LWP 770433) exited] [Thread 0x7ff2437fe640 (LWP 770432) exited] [Thread 0x7ff242ffd640 (LWP 770431) exited] set hca_bm.full.var[bursty_lonely_gene]: 0 true (0%) out of 33694 bools set hca_bm.full.var[properly_sampled_gene]: 27277 true (80.96%) out of 33694 bools set hca_bm.full.var[excluded_gene]: 6433 true (19.09%) out of 33694 bools set hca_bm.full.obs[excluded_umis]: 378000 float32s Will exclude 36458 (9.64%) cells with more than 25.00% excluded gene UMIs set hca_bm.full.obs[properly_sampled_cell]: 297810 true (78.79%) out of 378000 bools Traceback (most recent call last): File "/home/lindseydudley/Desktop/Metacell/metacells-vignettes/scripts/test_op.py", line 117, in mc.pl.exclude_cells( File "/home/lindseydudley/anaconda3/envs/mcell/lib/python3.12/site-packages/metacells/utilities/logging.py", line 384, in wrapper return function(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lindseydudley/anaconda3/envs/mcell/lib/python3.12/site-packages/metacells/pipeline/exclude.py", line 197, in exclude_cells tl.combine_masks(adata, excluded_cells_masks, to="excluded_cell") File "/home/lindseydudley/anaconda3/envs/mcell/lib/python3.12/site-packages/metacells/utilities/logging.py", line 384, in wrapper return function(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lindseydudley/anaconda3/envs/mcell/lib/python3.12/site-packages/metacells/tools/mask.py", line 87, in combine_masks raise KeyError(f"unknown mask data: {mask_name}") KeyError: 'unknown mask data: doublet_cell' [New Thread 0x7ff2137fe640 (LWP 770527)] [Thread 0x7ffefc04e640 (LWP 770058) exited] [Thread 0x7fff0484f640 (LWP 770057) exited] [Thread 0x7fff05050640 (LWP 770056) exited] [Thread 0x7fff0d851640 (LWP 770055) exited] [Thread 0x7fff1e052640 (LWP 770054) exited] [Thread 0x7fff1e853640 (LWP 770053) exited] [Thread 0x7fff2f054640 (LWP 770052) exited] [Thread 0x7fff37855640 (LWP 770051) exited] [Thread 0x7fff40056640 (LWP 770050) exited] [Thread 0x7fff48857640 (LWP 770049) exited] [Thread 0x7fff51058640 (LWP 770048) exited] [Thread 0x7fff51859640 (LWP 770047) exited] [Thread 0x7fff6205a640 (LWP 770046) exited] [Thread 0x7fff6a85b640 (LWP 770045) exited] [Thread 0x7fff7305c640 (LWP 770044) exited] [Thread 0x7fff7385d640 (LWP 770043) exited] [Thread 0x7fff8405e640 (LWP 770042) exited] [Thread 0x7fff8c85f640 (LWP 770041) exited] [Thread 0x7fff8d060640 (LWP 770040) exited] [Thread 0x7fff9d861640 (LWP 770039) exited] [Thread 0x7fffa6062640 (LWP 770038) exited] [Thread 0x7fffae863640 (LWP 770037) exited] [Thread 0x7fffb7064640 (LWP 770036) exited] [Thread 0x7fffbf865640 (LWP 770035) exited] [Thread 0x7fffc8066640 (LWP 770034) exited] [Thread 0x7fffd0867640 (LWP 770033) exited] [Thread 0x7fffd9068640 (LWP 770032) exited] [Thread 0x7fffe1869640 (LWP 770031) exited] [Thread 0x7fffea06a640 (LWP 770030) exited] [Thread 0x7ffff286b640 (LWP 770029) exited] [Thread 0x7ffff306c640 (LWP 770028) exited] [Thread 0x7ffff7ea3740 (LWP 770023) exited] [Thread 0x7ff2137fe640 (LWP 770527) exited] [New process 770023] [Inferior 1 (process 770023) exited with code 01]

orenbenkiki commented 5 months ago

I'm a bit confused. This error seems nothing to do with issue #5.

There the error was something like Program received signal SIGILL, Illegal instruction. and we had to wrangle over how to get the binary extension to compile properly. This is the only reason we resorted to using gdb - normally one doesn't have to do that.

Here the error is KeyError: 'unknown mask data: doublet_cell' from exclude_cells. Nowhere in my vignette do I mention doublet_cell. It seems you manually requested to exclude cells listed in this mask, but this mask doesn't exist in your data. It would be easier if you showed the test_op.py script. Issues such as these don't require gdb - it is sufficient to look at the error, the stack trace, and possibly a detailed log file.

lindsdudley commented 5 months ago

Here is my test_op.py script. I just took the code from the one pass jupyter notebook and put it into a python script. mport anndata as ad # For reading/writing AnnData files import matplotlib.pyplot as plt # For plotting import metacells as mc # The Metacells package import numpy as np # For array/matrix operations import pandas as pd # For data frames import os # For filesystem operations import seaborn as sb # For plotting import scipy.sparse as sp # For sparse matrices import shutil # for filesystem operations from math import hypot

Use SVG for scalable low-element-count diagrams.

config InlineBackend.figure_formats = ["svg"]

A matter of personal preference.

sb.set_style("white")

Running operations on an inefficient layout can make code much slower.

For example, summing the columns of a row-major matrix.

By default this will just be a warning.

We set it to be an error here to make sure the vignette does not lead you astray.

#

Note that this only affects the Metacells package.

Numpy will happily and silently take 100x longer for running such inefficient operations.

At least, there's no way I can tell to create a warning or error for this;

also, the implementation for "inefficient" operations could be much faster.

#

The workaround in either case is to explicitly re-layout the 2D matrix before the operations.

This turns out to be much faster, especially when the matrix can be reused.

Note that numpy is also very slow when doing matrix re-layout,

so the metacells package provides a function for doing it more efficiently.

#

Sigh.

mc.ut.allow_inefficient_layout(False)

shutil.rmtree("../output/one-pass", ignore_errors=True) shutil.rmtree("../mcview/one-pass", ignore_errors=True) os.makedirs("../output/one-pass/preliminary/figures", exist_ok=True) os.makedirs("../output/one-pass/final", exist_ok=True)

full = ad.read_h5ad("../blobs/hca_bm.full.h5ad") mc.ut.top_level(full) mc.ut.set_name(full, "hca_bm.full") print(f"Full: {full.n_obs} cells, {full.n_vars} genes")

PROPERLY_SAMPLED_MIN_CELL_TOTAL = 800 PROPERLY_SAMPLED_MAX_CELL_TOTAL = 20000

total_umis_per_cell = mc.ut.get_o_numpy(full, "x", sum=True) plot = sb.displot(total_umis_per_cell, log_scale=(10, None)) plot.set(xlabel="UMIs", ylabel="Density", yticks=[])

plot.refline(x=PROPERLY_SAMPLED_MIN_CELL_TOTAL, color="darkgreen") plot.refline(x=PROPERLY_SAMPLED_MAX_CELL_TOTAL, color="crimson")

plt.savefig("../output/one-pass/preliminary/figures/cell_total_umis.svg")

too_small_cells_count = np.sum(total_umis_per_cell < PROPERLY_SAMPLED_MIN_CELL_TOTAL) too_large_cells_count = np.sum(total_umis_per_cell > PROPERLY_SAMPLED_MAX_CELL_TOTAL)

total_umis_per_cell = mc.ut.get_o_numpy(full, name="x", sum=True) too_small_cells_percent = 100.0 too_small_cells_count / full.n_obs too_large_cells_percent = 100.0 too_large_cells_count / full.n_obs

print( f"Will exclude {too_small_cells_count} ({too_small_cells_percent:.2f}%%) cells" f" with less than {PROPERLY_SAMPLED_MIN_CELL_TOTAL} UMIs" ) print( f"Will exclude {too_large_cells_count} ({too_large_cells_percent:.2f}%%) cells" f" with more than {PROPERLY_SAMPLED_MAX_CELL_TOTAL} UMIs")

EXCLUDED_GENE_NAMES = [ "XIST", "MALAT1", # Sex-specific genes. "NEAT1" # Non-coding. ] EXCLUDED_GENE_PATTERNS = ["MT-.*"] # Mytochondrial.

mc.pl.exclude_genes( full, excluded_gene_names=EXCLUDED_GENE_NAMES, excluded_gene_patterns=EXCLUDED_GENE_PATTERNS, random_seed=123456, )

mc.tl.compute_excluded_gene_umis(full)

PROPERLY_SAMPLED_MAX_EXCLUDED_GENES_FRACTION = 0.25

excluded_umis_fraction_regularization = 1e-3 # Avoid 0 values in log scale plot. excluded_umis_per_cell = mc.ut.get_o_numpy(full, "excluded_umis") excluded_umis_fraction_per_cell = excluded_umis_per_cell / total_umis_per_cell

excluded_umis_fraction_per_cell += excluded_umis_fraction_regularization plot = sb.displot(excluded_umis_fraction_per_cell, log_scale=(10, None)) excluded_umis_fraction_per_cell -= excluded_umis_fraction_regularization

plot.set(xlabel="Fraction of excluded gene UMIs", ylabel="Density", yticks=[]) plot.refline(x=PROPERLY_SAMPLED_MAX_EXCLUDED_GENES_FRACTION, color="crimson")

plt.savefig("../output/one-pass/preliminary/figures/cell_excluded_umis_fraction.svg")

too_excluded_cells_count = np.sum( excluded_umis_fraction_per_cell > PROPERLY_SAMPLED_MAX_EXCLUDED_GENES_FRACTION ) too_excluded_cells_fraction = too_excluded_cells_count / full.n_obs print( f"Will exclude {too_excluded_cells_count} ({100 too_excluded_cells_fraction:.2f}%) cells" f" with more than {100 PROPERLY_SAMPLED_MAX_EXCLUDED_GENES_FRACTION:.2f}% excluded gene UMIs" )

mc.pl.exclude_cells( full, properly_sampled_min_cell_total=PROPERLY_SAMPLED_MIN_CELL_TOTAL, properly_sampled_max_cell_total=PROPERLY_SAMPLED_MAX_CELL_TOTAL, properly_sampled_max_excluded_genes_fraction=PROPERLY_SAMPLED_MAX_EXCLUDED_GENES_FRACTION, additional_cells_masks=["|doublet_cell"] )

clean = mc.pl.extract_clean_data(full, name="hca_bm.one-pass.clean") mc.ut.top_level(clean) print(f"Clean: {clean.n_obs} cells, {clean.n_vars} genes")

full.write_h5ad("../output/one-pass/preliminary/hca_bm.full.h5ad") full = None # Allow it to be gc-ed

clean.write_h5ad("../output/one-pass/preliminary/hca_bm.clean.h5ad")

cells = clean clean = None # Allow it to be gc-ed mc.ut.set_name(cells, "hca_bm.one-pass.preliminary.cells") print(f"Input: {cells.n_obs} cells, {cells.n_vars} genes")

LATERAL_GENE_NAMES = [ "ACSM3", "ANP32B", "APOE", "AURKA", "B2M", "BIRC5", "BTG2", "CALM1", "CD63", "CD69", "CDK4", "CENPF", "CENPU", "CENPW", "CH17-373J23.1", "CKS1B", "CKS2", "COX4I1", "CXCR4", "DNAJB1", "DONSON", "DUSP1", "DUT", "EEF1A1", "EEF1B2", "EIF3E", "EMP3", "FKBP4", "FOS", "FOSB", "FTH1", "G0S2", "GGH", "GLTSCR2", "GMNN", "GNB2L1", "GPR183", "H2AFZ", "H3F3B", "HBM", "HIST1H1C", "HIST1H2AC", "HIST1H2BG", "HIST1H4C", "HLA-A", "HLA-B", "HLA-C", "HLA-DMA", "HLA-DMB", "HLA-DPA1", "HLA-DPB1", "HLA-DQA1", "HLA-DQB1", "HLA-DRA", "HLA-DRB1", "HLA-E", "HLA-F", "HMGA1", "HMGB1", "HMGB2", "HMGB3", "HMGN2", "HNRNPAB", "HSP90AA1", "HSP90AB1", "HSPA1A", "HSPA1B", "HSPA6", "HSPD1", "HSPE1", "HSPH1", "ID2", "IER2", "IGHA1", "IGHA2", "IGHD", "IGHG1", "IGHG2", "IGHG3", "IGHG4", "IGHM", "IGKC", "IGKV1-12", "IGKV1-39", "IGKV1-5", "IGKV3-15", "IGKV4-1","HLA-DPA1", "HLA-DPB1", "HLA-DQA1", "HLA-DQB1", "HLA-DRA", "HLA-DRB1", "HLA-E", "HLA-F", "HMGA1", "HMGB1", "HMGB2", "HMGB3", "HMGN2", "HNRNPAB", "HSP90AA1", "HSP90AB1", "HSPA1A", "HSPA1B", "HSPA6", "HSPD1", "HSPE1", "HSPH1", "ID2", "IER2", "IGHA1", "IGHA2", "IGHD", "IGHG1", "IGHG2", "IGHG3", "IGHG4", "IGHM", "IGKC", "IGKV1-12", "IGKV1-39", "IGKV1-5", "IGKV3-15", "IGKV4-1", "IGLC2", "IGLC3", "IGLC6", "IGLC7", "IGLL1", "IGLL5", "IGLV2-34", "JUN", "JUNB", "KIAA0101", "LEPROTL1", "LGALS1", "LINC01206", "LTB", "MCM3", "MCM4", "MCM7", "MKI67", "MT2A", "MYL12A", "MYL6", "NASP", "NFKBIA", "NUSAP1", "PA2G4", "PCNA", "PDLIM1", "PLK3", "PPP1R15A", "PTMA", "PTTG1", "RAN", "RANBP1", "RGCC", "RGS1", "RGS2", "RGS3", "RP11-1143G9.4", "RP11-160E2.6", "RP11-53B5.1", "RP11-620J15.3", "RP5-1025A1.3", "RP5-1171I10.5", "RPS10", "RPS10-NUDT3", "RPS11", "RPS12", "RPS13", "RPS14", "RPS15", "RPS15A", "RPS16", "RPS17", "RPS18", "RPS19", "RPS19BP1", "RPS2", "RPS20", "RPS21", "RPS23", "RPS24", "RPS25", "RPS26", "RPS27", "RPS27A", "RPS27L", "RPS28", "RPS29", "RPS3", "RPS3A", "RPS4X", "RPS4Y1", "RPS4Y2", "RPS5", "RPS6", "RPS6KA1", "RPS6KA2", "RPS6KA2-AS1", "RPS6KA3", "RPS6KA4", "RPS6KA5", "RPS6KA6", "RPS6KB1", "RPS6KB2", "RPS6KC1", "RPS6KL1", "RPS7", "RPS8", "RPS9", "RPSA", "RRM2", "SMC4", "SRGN", "SRSF7", "STMN1", "TK1", "TMSB4X", "TOP2A", "TPX2", "TSC22D3", "TUBA1A", "TUBA1B", "TUBB", "TUBB4B", "TXN", "TYMS", "UBA52", "UBC", "UBE2C", "UHRF1", "YBX1", "YPEL5", "ZFP36", "ZWINT" ] LATERAL_GENE_PATTERNS = ["RP[LS].*"] # Ribosomal

This will mark as "lateral_gene" any genes that match the above, if they exist in the clean dataset.

mc.pl.mark_lateral_genes( cells, lateral_gene_names=LATERAL_GENE_NAMES, lateral_gene_patterns=LATERAL_GENE_PATTERNS, ) lateral_gene_mask = mc.ut.get_v_numpy(cells, "lateral_gene") lateral_gene_names = set(cells.var_names[lateral_gene_mask]) print(sorted([ name for name in lateral_gene_names if not name.startswith("RPL") and not name.startswith("RPS") ])) print(f"""and {len([ name for name in lateral_gene_names if name.startswith("RPL") or name.startswith("RPS") ])} RP[LS].* genes""")

NOISY_GENE_NAMES = [ "CCL3", "CCL4", "CCL5", "CXCL8", "DUSP1", "FOS", "G0S2", "HBB", "HIST1H4C", "IER2", "IGKC", "IGLC2", "JUN", "JUNB", "KLRB1", "MT2A", "RPS26", "RPS4Y1", "TRBC1", "TUBA1B", "TUBB" ]

mc.pl.mark_noisy_genes(cells, noisy_gene_names=NOISY_GENE_NAMES)

Either use the guesstimator:

max_parallel_piles = mc.pl.guess_max_parallel_piles(cells)

Or, if running out of memory manually override:

max_paralle_piles = ...

print(max_parallel_piles) mc.pl.set_max_parallel_piles(max_parallel_piles)

with mc.ut.progress_bar(): mc.pl.divide_and_conquer_pipeline(cells, random_seed=123456)

metacells = \ mc.pl.collect_metacells(cells, name="hca_bm.one-pass.preliminary.metacells", random_seed=123456) print(f"Preliminary: {metacells.n_obs} metacells, {metacells.n_vars} genes")

Assign a single value for each metacell based on the cells.

mc.tl.convey_obs_to_group( adata=cells, gdata=metacells, property_name="donor_organism.organism_age", to_property_name="sex", method=mc.ut.most_frequent # This is the default, for categorical data ) mc.tl.convey_obs_to_group( adata=cells, gdata=metacells, property_name="donor_organism.organism_age", to_property_name="age", method=np.mean )

Compute the fraction of cells with each possible value in each metacell:

mc.tl.convey_obs_fractions_to_group( adata=cells, gdata=metacells, property_name="donor_organism.sex", to_property_name="sex" ) mc.tl.convey_obs_fractions_to_group( # Age has just a few possible values so treat it as categorical. adata=cells, gdata=metacells, property_name="donor_organism.organism_age", to_property_name="age" ) mc.tl.convey_obs_fractions_to_group(adata=cells, gdata=metacells, property_name="donor") mc.tl.convey_obs_fractions_to_group(adata=cells, gdata=metacells, property_name="batch")

with mc.ut.progress_bar(): mc.pl.compute_for_mcview(adata=cells, gdata=metacells, random_seed=123456)

min_long_edge_size = 4 umap_x = mc.ut.get_o_numpy(metacells, "x") umap_y = mc.ut.get_o_numpy(metacells, "y") umap_edges = sp.coo_matrix(mc.ut.get_oo_proper(metacells, "obs_outgoing_weights")) sb.set() plot = sb.scatterplot(x=umap_x, y=umap_y, s=10) for ( source_index, target_index, weight ) in zip( umap_edges.row, umap_edges.col, umap_edges.data ): source_x = umap_x[source_index] target_x = umap_x[target_index] source_y = umap_y[source_index] target_y = umap_y[target_index] if hypot(target_x - source_x, target_y - source_y) >= min_long_edge_size: plt.plot([source_x, target_x], [source_y, target_y], linewidth=weight * 2, color='indigo') plt.show()

cells.write_h5ad("../output/one-pass/preliminary/hca_bm.cells.h5ad")

metacells.write_h5ad("../output/one-pass/preliminary/hca_bm.metacells.h5ad")

When I run it line by line this is the specific error that pops out.

File "/home/lindseydudley/anaconda3/envs/mcell/lib/python3.12/site-packages/metacells/utilities/logging.py", line 384, in wrapper return function(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lindseydudley/anaconda3/envs/mcell/lib/python3.12/site-packages/metacells/pipeline/exclude.py", line 235, in extract_clean_data results = tl.filter_data( ^^^^^^^^^^^^^^^ File "/home/lindseydudley/anaconda3/envs/mcell/lib/python3.12/site-packages/metacells/utilities/logging.py", line 384, in wrapper return function(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lindseydudley/anaconda3/envs/mcell/lib/python3.12/site-packages/metacells/tools/filter.py", line 82, in filter_data mask = combine_masks(adata, obs_masks, invert=invert_obs, to=mask_obs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lindseydudley/anaconda3/envs/mcell/lib/python3.12/site-packages/metacells/utilities/logging.py", line 384, in wrapper return function(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lindseydudley/anaconda3/envs/mcell/lib/python3.12/site-packages/metacells/tools/mask.py", line 113, in combine_masks raise ValueError("no masks to combine") ValueError: no masks to combine

orenbenkiki commented 5 months ago

It seems you are missing cell 5 of the notebook https://tanaylab.github.io/metacells-vignettes/one-pass.html

lindsdudley commented 5 months ago

Hi Oren,

Thank you so much for your response! This seems to have just been a copy paste error and I'm so sorry. Here's my gdb with the corrected script. Please let me know if you what to do about this error.

[New Thread 0x7ffff306c640 (LWP 1492911)] [New Thread 0x7ffff286b640 (LWP 1492912)] [New Thread 0x7fffea06a640 (LWP 1492913)] [New Thread 0x7fffe1869640 (LWP 1492914)] [New Thread 0x7fffd9068640 (LWP 1492915)] [New Thread 0x7fffd0867640 (LWP 1492916)] [New Thread 0x7fffc8066640 (LWP 1492917)] [New Thread 0x7fffbf865640 (LWP 1492918)] [New Thread 0x7fffaf064640 (LWP 1492919)] [New Thread 0x7fffae863640 (LWP 1492920)] [New Thread 0x7fffa6062640 (LWP 1492921)] [New Thread 0x7fff9d861640 (LWP 1492922)] [New Thread 0x7fff95060640 (LWP 1492923)] [New Thread 0x7fff8c85f640 (LWP 1492924)] [New Thread 0x7fff8405e640 (LWP 1492925)] [New Thread 0x7fff7b85d640 (LWP 1492926)] [New Thread 0x7fff7305c640 (LWP 1492927)] [New Thread 0x7fff6a85b640 (LWP 1492928)] [New Thread 0x7fff6205a640 (LWP 1492929)] [New Thread 0x7fff59859640 (LWP 1492930)] [New Thread 0x7fff49058640 (LWP 1492931)] [New Thread 0x7fff40857640 (LWP 1492932)] [New Thread 0x7fff40056640 (LWP 1492933)] [New Thread 0x7fff2f855640 (LWP 1492934)] [New Thread 0x7fff2f054640 (LWP 1492935)] [New Thread 0x7fff1e853640 (LWP 1492936)] [New Thread 0x7fff1e052640 (LWP 1492937)] [New Thread 0x7fff15851640 (LWP 1492938)] [New Thread 0x7fff05050640 (LWP 1492939)] [New Thread 0x7ffefc84f640 (LWP 1492940)] [New Thread 0x7ffefc04e640 (LWP 1492941)] [Detaching after vfork from child process 1492952] [Detaching after vfork from child process 1492963] Full: 378000 cells, 33694 genes set hca_bm.full.obs[doublet_cell]: 1197 true (0.3167%) out of 378000 bools Will exclude 66232 (17.52%%) cells with less than 800 UMIs Will exclude 8672 (2.29%%) cells with more than 20000 UMIs [New Thread 0x7ff2f6fb1640 (LWP 1493589)] [New Thread 0x7ff2f67b0640 (LWP 1493590)] [New Thread 0x7ff24b90b640 (LWP 1493591)] [New Thread 0x7ff24b10a640 (LWP 1493592)] [New Thread 0x7ff24a909640 (LWP 1493593)] [New Thread 0x7ff24a108640 (LWP 1493594)] [New Thread 0x7ff249907640 (LWP 1493595)] [New Thread 0x7ff249106640 (LWP 1493596)] [New Thread 0x7ff248905640 (LWP 1493597)] [New Thread 0x7ff248104640 (LWP 1493598)] [New Thread 0x7ff247903640 (LWP 1493599)] [New Thread 0x7ff247102640 (LWP 1493600)] [New Thread 0x7ff246901640 (LWP 1493601)] [New Thread 0x7ff246100640 (LWP 1493602)] [New Thread 0x7ff2458ff640 (LWP 1493603)] [New Thread 0x7ff2450fe640 (LWP 1493604)] [Thread 0x7ff2450fe640 (LWP 1493604) exited] [Thread 0x7ff246100640 (LWP 1493602) exited] [Thread 0x7ff249907640 (LWP 1493595) exited] [Thread 0x7ff24a909640 (LWP 1493593) exited] [Thread 0x7ff247903640 (LWP 1493599) exited] [Thread 0x7ff2f67b0640 (LWP 1493590) exited] [Thread 0x7ff24a108640 (LWP 1493594) exited] [Thread 0x7ff247102640 (LWP 1493600) exited] [Thread 0x7ff24b10a640 (LWP 1493592) exited] [Thread 0x7ff249106640 (LWP 1493596) exited] [Thread 0x7ff2458ff640 (LWP 1493603) exited] [Thread 0x7ff248104640 (LWP 1493598) exited] [Thread 0x7ff2f6fb1640 (LWP 1493589) exited] [Thread 0x7ff246901640 (LWP 1493601) exited] [Thread 0x7ff248905640 (LWP 1493597) exited] [Thread 0x7ff24b90b640 (LWP 1493591) exited] [New Thread 0x7ff2450fe640 (LWP 1493610)] [New Thread 0x7ff2458ff640 (LWP 1493611)] [New Thread 0x7ff246100640 (LWP 1493612)] [New Thread 0x7ff246901640 (LWP 1493613)] [New Thread 0x7ff2f6fb1640 (LWP 1493614)] [New Thread 0x7ff2f67b0640 (LWP 1493615)] [New Thread 0x7ff248176640 (LWP 1493616)] [New Thread 0x7ff247975640 (LWP 1493617)] [New Thread 0x7ff247174640 (LWP 1493618)] [New Thread 0x7ff2448fd640 (LWP 1493619)] [New Thread 0x7ff23c86a640 (LWP 1493620)] [New Thread 0x7ff23486a640 (LWP 1493621)] [New Thread 0x7ff22ffff640 (LWP 1493622)] [New Thread 0x7ff22f7fe640 (LWP 1493623)] [New Thread 0x7ff22effd640 (LWP 1493624)] [Thread 0x7ff22effd640 (LWP 1493624) exited] [Thread 0x7ff22f7fe640 (LWP 1493623) exited] [Thread 0x7ff22ffff640 (LWP 1493622) exited] [Thread 0x7ff23486a640 (LWP 1493621) exited] [Thread 0x7ff23c86a640 (LWP 1493620) exited] [Thread 0x7ff2448fd640 (LWP 1493619) exited] [Thread 0x7ff247174640 (LWP 1493618) exited] [Thread 0x7ff247975640 (LWP 1493617) exited] [Thread 0x7ff248176640 (LWP 1493616) exited] [Thread 0x7ff2f67b0640 (LWP 1493615) exited] [Thread 0x7ff2f6fb1640 (LWP 1493614) exited] [Thread 0x7ff246901640 (LWP 1493613) exited] [New Thread 0x7ff22effd640 (LWP 1493625)] [Thread 0x7ff246100640 (LWP 1493612) exited] [Thread 0x7ff2458ff640 (LWP 1493611) exited] [Thread 0x7ff2450fe640 (LWP 1493610) exited] [New Thread 0x7ff22f7fe640 (LWP 1493626)] [New Thread 0x7ff22ffff640 (LWP 1493627)] [New Thread 0x7ff23486a640 (LWP 1493628)] [New Thread 0x7ff2f6fb1640 (LWP 1493629)] [New Thread 0x7ff2f67b0640 (LWP 1493630)] [New Thread 0x7ff248176640 (LWP 1493631)] [New Thread 0x7ff247975640 (LWP 1493632)] [New Thread 0x7ff247174640 (LWP 1493633)] [New Thread 0x7ff246973640 (LWP 1493634)] [New Thread 0x7ff246172640 (LWP 1493635)] [New Thread 0x7ff245971640 (LWP 1493636)] [New Thread 0x7ff245170640 (LWP 1493637)] [New Thread 0x7ff24496f640 (LWP 1493638)] [New Thread 0x7ff23c86a640 (LWP 1493639)] [New Thread 0x7ff22e7fc640 (LWP 1493640)] [Thread 0x7ff22e7fc640 (LWP 1493640) exited] [Thread 0x7ff23c86a640 (LWP 1493639) exited] [Thread 0x7ff24496f640 (LWP 1493638) exited] [Thread 0x7ff245170640 (LWP 1493637) exited] [Thread 0x7ff245971640 (LWP 1493636) exited] [Thread 0x7ff246172640 (LWP 1493635) exited] [Thread 0x7ff246973640 (LWP 1493634) exited] [Thread 0x7ff247174640 (LWP 1493633) exited] [Thread 0x7ff247975640 (LWP 1493632) exited] [Thread 0x7ff248176640 (LWP 1493631) exited] [Thread 0x7ff2f67b0640 (LWP 1493630) exited] [Thread 0x7ff2f6fb1640 (LWP 1493629) exited] [Thread 0x7ff23486a640 (LWP 1493628) exited] [Thread 0x7ff22ffff640 (LWP 1493627) exited] [Thread 0x7ff22f7fe640 (LWP 1493626) exited] [Thread 0x7ff22effd640 (LWP 1493625) exited] set hca_bm.full.var[bursty_lonely_gene]: 0 true (0%) out of 33694 bools set hca_bm.full.var[properly_sampled_gene]: 27277 true (80.96%) out of 33694 bools set hca_bm.full.var[excluded_gene]: 6433 true (19.09%) out of 33694 bools set hca_bm.full.obs[excluded_umis]: 378000 float32s Will exclude 36458 (9.64%) cells with more than 25.00% excluded gene UMIs set hca_bm.full.obs[properly_sampled_cell]: 297810 true (78.79%) out of 378000 bools set hca_bm.full.obs[excluded_cell]: 81387 true (21.53%) out of 378000 bools [Thread 0x7ffefc04e640 (LWP 1492941) exited] [Thread 0x7ffefc84f640 (LWP 1492940) exited] [Thread 0x7fff05050640 (LWP 1492939) exited] [Thread 0x7fff15851640 (LWP 1492938) exited] [Thread 0x7fff1e052640 (LWP 1492937) exited] [Thread 0x7fff1e853640 (LWP 1492936) exited] [Thread 0x7fff2f054640 (LWP 1492935) exited] [Thread 0x7fff2f855640 (LWP 1492934) exited] [Thread 0x7fff40857640 (LWP 1492932) exited] [Thread 0x7fff49058640 (LWP 1492931) exited] [Thread 0x7fff59859640 (LWP 1492930) exited] [Thread 0x7fff6205a640 (LWP 1492929) exited] [Thread 0x7fff6a85b640 (LWP 1492928) exited] [Thread 0x7fff7305c640 (LWP 1492927) exited] [Thread 0x7fff7b85d640 (LWP 1492926) exited] [Thread 0x7fff8405e640 (LWP 1492925) exited] [Thread 0x7fff8c85f640 (LWP 1492924) exited] [Thread 0x7fff95060640 (LWP 1492923) exited] [Thread 0x7fff9d861640 (LWP 1492922) exited] [Thread 0x7fffa6062640 (LWP 1492921) exited] [Thread 0x7fffae863640 (LWP 1492920) exited] [Thread 0x7fffaf064640 (LWP 1492919) exited] [Thread 0x7fffbf865640 (LWP 1492918) exited] [Thread 0x7fffc8066640 (LWP 1492917) exited] [Thread 0x7fffd0867640 (LWP 1492916) exited] [Thread 0x7fffd9068640 (LWP 1492915) exited] [Thread 0x7fffe1869640 (LWP 1492914) exited] [Thread 0x7fffea06a640 (LWP 1492913) exited] [Thread 0x7ffff286b640 (LWP 1492912) exited] [Thread 0x7ffff306c640 (LWP 1492911) exited] [Thread 0x7ffff7ea3740 (LWP 1492904) exited] [Thread 0x7fff40056640 (LWP 1492933) exited] [New process 1492904]

Program terminated with signal SIGKILL, Killed. The program no longer exists.

orenbenkiki commented 5 months ago

Hmmm - there's nothing there other than SIGKILL. Assuming this is on Linux, one possibility is that the program run out of memory? That would be surprising unless you are running this on a machine with very small memory, since you haven't even got to the divide-and-conquer part yet. You can check this if you run top or htop in parallel to the application and track the amount of used memory.

lindsdudley commented 4 months ago

top.txt

I have attached my top output while running the program. Please let me know if anything jumps out at you that could help me debug! I appreciate all of the help because I definitely want to use your program.

orenbenkiki commented 4 months ago

top is an interactive program, it shows in real-time the status of the system. The snapshot you are showing me is of an idle system. You should run it in parallel to running the notebook and track the memory usage. It seems your system doesn't have a lot of memory (~12GB?) and it is likely you'll see the amount of free memory go down to 0 and then the program will be killed by the OS.

lindsdudley commented 4 months ago

436067 lindsey+ 26 6 52.1g 9.2g 123648 D 18.5 7.3 0:10.34 python
431698 root 20 0 0 0 0 I 10.3 0.0 0:06.35 kworker/u65:8-kcryptd/252:0
434629 root 20 0 0 0 0 I 7.6 0.0 0:04.88 kworker/u65:1-kcryptd/252:0
3173 lindsey+ 17 -3 29.3g 71776 24496 S 2.3 0.1 36:33.03 Xorg
3385 lindsey+ 17 -3 4385076 296952 84572 S 1.3 0.2 27:30.20 gnome-shell
1380 root 32 12 332696 1024 0 S 0.7 0.0 0:02.32 touchegg
3537 lindsey+ 26 6 162848 3328 3072 S 0.7 0.0 0:01.11 at-spi2-registr
426288 lindsey+ 20 0 565380 36152 24296 S 0.7 0.0 0:05.79 gnome-terminal-
433547 root 20 0 0 0 0 I 0.7 0.0 0:10.29 kworker/u65:5-kcryptd/252:0
1582 root 32 12 316304 5064 3216 S 0.3 0.0 1:15.39 execsnoop-bpfcc
2828 root 32 12 939968 58900 4864 S 0.3 0.0 20:14.26 xagt
22929 lindsey+ 32 12 2578792 172104 2344 S 0.3 0.1 37:31.68 anydesk
429623 root 20 0 0 0 0 I 0.3 0.0 0:12.63 kworker/u65:0-kcryptd/252:0
435343 lindsey+ 26 6 32.5g 113516 89348 S 0.3 0.1 0:00.62 chrome
436123 lindsey+ 26 6 23452 4608 3328 R 0.3 0.0 0:00.08 top
1 root 20 0 166592 3628 1536 S 0.0 0.0 0:03.43 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.22 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pool_workqueue_release
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/R-rcu_g
5 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/R-rcup
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/R-slub

7 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/R-netns
10 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-events_highpri
12 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/R-mm_pe
13 root 20 0 0 0 0 I 0.0 0.0 0:00.00 rcu_tasks_kthread
14 root 20 0 0 0 0 I 0.0 0.0 0:00.00 rcu_tasks_rude_kthread
15 root 20 0 0 0 0 I 0.0 0.0 0:00.00 rcu_tasks_trace_kthread
16 root 20 0 0 0 0 S 0.0 0.0 0:00.42 ksoftirqd/0
17 root 20 0 0 0 0 I 0.0 0.0 9:35.25 rcu_preempt
18 root rt 0 0 0 0 S 0.0 0.0 0:01.03 migration/0
19 root -51 0 0 0 0 S 0.0 0.0 0:00.00 idle_inject/0
20 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0
21 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1
22 root -51 0 0 0 0 S 0.0 0.0 0:00.00 idle_inject/1
23 root rt 0 0 0 0 S 0.0 0.0 0:01.32 migration/1
24 root 20 0 0 0 0 S 0.0 0.0 0:00.12 ksoftirqd/1
26 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/1:0H-events_highpri
27 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/2
28 root -51 0 0 0 0 S 0.0 0.0 0:00.00 idle_inject/2
29 root rt 0 0 0 0 S 0.0 0.0 0:01.32 migration/2
30 root 20 0 0 0 0 S 0.0 0.0 0:00.11 ksoftirqd/2

lindsdudley commented 4 months ago

This is my top when doing it interactively. My computer actually has 128 GB ran and a CPU so I don't think that memory is the limiting factor.

orenbenkiki commented 4 months ago

"top when doing it interactively" is more of a movie than a picture. Assuming memory isn't the issue, it leaves open the question of why the program was killed. SIGKILL is sent by the kernel, either because of extreme resource issues (memory), or because someone run the kill command. One way to figure it out is to track /var/log/messages - there should be a log message about why the kernel killed the program.

lindsdudley commented 4 months ago

When looking at my logs rfkill seems to be the issue but I'm not sure why. This is the log

Apr 19 13:18:04 pop-os kernel: [ 144.258322] rfkill: input handler disabled Apr 19 13:18:14 pop-os kernel: [ 153.885741] hid-generic 0003:3384:0005.0008: hiddev0,hidraw1: USB HID v1.11 Device [System76 Launch Configurable Keyboard (launch_lite_1)] on usb-0000:10:00.0-1/input1 Apr 19 13:18:14 pop-os kernel: [ 154.043073] rfkill: input handler enabled Apr 19 13:18:17 pop-os kernel: [ 156.180270] simple-framebuffer simple-framebuffer.0: swiotlb buffer is full (sz: 524288 bytes), total 32768 (slots), used 1278 (slots) Apr 19 13:18:18 pop-os kernel: [ 157.002732] rfkill: input handler disabled Apr 19 13:23:09 pop-os kernel: [ 448.452216] wlp14s0: disconnect from AP 44:48:c1:a9:fb:a3 for new auth to 44:48:c1:a9:fb:b3 Apr 19 13:23:10 pop-os kernel: [ 448.759291] wlp14s0: authenticate with 44:48:c1:a9:fb:b3 (local address=f0:a6:54:14:08:d7) Apr 19 13:23:10 pop-os kernel: [ 449.251504] wlp14s0: send auth to 44:48:c1:a9:fb:b3 (try 1/3) Apr 19 13:23:10 pop-os kernel: [ 449.253176] wlp14s0: authenticated Apr 19 13:23:10 pop-os kernel: [ 449.279937] wlp14s0: associate with 44:48:c1:a9:fb:b3 (try 1/3) Apr 19 13:23:10 pop-os kernel: [ 449.311218] wlp14s0: RX ReassocResp from 44:48:c1:a9:fb:b3 (capab=0x411 status=0 aid=6) Apr 19 13:23:10 pop-os kernel: [ 449.346625] wlp14s0: associated Apr 19 13:23:10 pop-os kernel: [ 449.346684] wlp14s0: Limiting TX power to 30 (30 - 0) dBm as advertised by 44:48:c1:a9:fb:b3 Apr 19 13:23:13 pop-os kernel: [ 451.507983] wlp14s0: Connection to AP 44:48:c1:a9:fb:b3 lost Apr 19 13:23:13 pop-os kernel: [ 452.037221] wlp14s0: authenticate with 44:48:c1:a9:fb:a3 (local address=f0:a6:54:14:08:d7) Apr 19 13:23:13 pop-os kernel: [ 452.051466] wlp14s0: send auth to 44:48:c1:a9:fb:a3 (try 1/3) Apr 19 13:23:13 pop-os kernel: [ 452.053241] wlp14s0: authenticated Apr 19 13:23:13 pop-os kernel: [ 452.059684] wlp14s0: associate with 44:48:c1:a9:fb:a3 (try 1/3) Apr 19 13:23:13 pop-os kernel: [ 452.067843] wlp14s0: RX AssocResp from 44:48:c1:a9:fb:a3 (capab=0x431 status=0 aid=1) Apr 19 13:23:13 pop-os kernel: [ 452.103283] wlp14s0: associated

orenbenkiki commented 4 months ago

rfkill is some tool for dis/enabling wireless devices, I don't think it is relevant. I also don't see any "killed" log messages. Perhaps something in https://stackoverflow.com/questions/726690/what-killed-my-process-and-why can help?