sgkit-dev / sgkit

Scalable genetics toolkit
https://sgkit-dev.github.io/sgkit
Apache License 2.0
231 stars 32 forks source link

Tests should use process-based dask #1051

Open benjeffery opened 1 year ago

benjeffery commented 1 year ago

1043 shows that we should test with a processed-based dask cluster.

I've tried this by adding client = dask.distributed.Client(n_workers=1, threads_per_worker=1) to conftest.py but I get segfaults in workers. Attaching GDB to the workers shows that the segfaults are in several numba gufuncs such as count_alleles and cohort_sum. Deleting the __pycache__ will sometimes stop a particular test from failing, which is disconcerting!

timothymillar commented 1 year ago

Possibly related to #869? Do you get segfaults if you disable numba caching?

benjeffery commented 1 year ago

Ah, yes this is! Thanks for the pointer.

jeromekelleher commented 10 months ago

I think it's important that we test on both the default dask threads-within-process and an explicit scheduler. Is there something we can do with pytest to run the full test suite under both conditions?