Closed CodyCBakerPhD closed 1 year ago
ReadTheDocs shows the following error due to the added threadpoolctl
requirement.
ERROR: Could not find a version that satisfies the requirement threadpoolctl==3.2.0 (from versions: 1.0.0, 1.1.0, 2.0.0, 2.1.0, 2.2.0, 3.0.0, 3.1.0)
ERROR: No matching distribution found for threadpoolctl==3.2.0
Maybe the readthedocs.yaml
or the requirements.txt
file need some adjustment.
Maybe the readthedocs.yaml or the requirements.txt file need some adjustment.
I do believe this is an issue with the version of Python being used to compile the docs - do you know what that is?
You can also specify it explicitly in the config file like here: https://github.com/catalystneuro/neuroconv/blob/main/.readthedocs.yaml#L10-L11
Alternative would I suppose be to lower the exact version pin for the CI, but I defer to how y'all prefer to have all that setup
Some tests define a new ZarrIO
and call write_dataset
directly. Because self.__dci_queue
is initialized only in write
and now export
, these tests fail because self.dci_queue
is None. If these tests are meant to test write_dataset
in a unit test fashion, then these tests need to be adjusted so that write
is called first or the __dci_queue
variable is otherwise set. If these tests are meant to be integration tests, then these tests need to be adjusted so that they call write
instead of write_dataset
which users would not be doing.
Ahh good catch: https://github.com/hdmf-dev/hdmf-zarr/blob/6c13e14927eea985d53174d8580224c97d65707a/src/hdmf_zarr/backend.py#L839-L840
Since the method is not marked as private I'll just instantiate a standard non-parallel Queue at that point then
If these tests are meant to test
write_dataset
in a unit test fashion,
Those should be unit tests, since write_dataset is not a a function that a user should ever call, but is used internally. The tests should be adjusted to manually set the dci_queue variable before calling write_dataset. Alternatively, we could also add if __dci_queue is None
at the beginning of write_dataset to set it if it is not initialized (which may be safer)
Since the method is not marked as private I'll just instantiate a standard non-parallel Queue at that point then
Sorry, I didn't see your comment until after I posted my other response. I agree, "instantiate a standard non-parallel Queue at that point then" is the way to go.
Surprised that 3.7 is still supported here - is there a timeline for when that will be dropped?
Otherwise, the currently failing CI is, I believe, due to the version pin of hdmf==3.5.4
, which this feature requires some changes available on hdmf>=3.9.0
Surprised that 3.7 is still supported here - is there a timeline for when that will be dropped?
This is due to the pin on the HDMF version. Once we have the issue with references on export resolved we'll update the HDMF version and then we can also update the tests. @mavaylon1 is working on the issue.
Otherwise, the currently failing CI is, I believe, due to the version pin of
hdmf==3.5.4
, which this feature requires some changes available onhdmf>=3.9.0
To get the tests to run for now, you could just increase the hdmf version on this PR so we can see that the CI is running. There will be a couple of tests that fail for export, but at least we can then see that everything is working in the CI. Aside from the bug on export, I believe you can safely use the current version of HDMF without having to change anything else in the code.
@CodyCBakerPhD with #120 now merged, the dev branch now supports the latest HDMF. Could you sync this PR with the dev branch to see if that fixes the failing tests now so that we can move forward with merging this PR as well.
Attention: 7 lines
in your changes are missing coverage. Please review.
Comparison is base (
c262481
) 84.73% compared to head (9d84044
) 85.66%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@oruebel Done, not sure what's up with the coverage workflows though
fix #101
replace #111
Motivation
Zarr supports efficient parallelization, but enabling it seamlessly with only a single argument (
number_of_jobs
atio.write
) took a bit of effort.Currently seeing progressive speedups with the attached dummy script as the number of jobs increases; on the DANDI Hub ~160s for 1 CPU, . Will make an averaged plot over the number of jobs to use for reference
Will make a full averaged plot over the number of jobs to use for reference
Opening in draft while I assess what all is still necessary and what can still be optimized in terms of worker/job initialization
Also will have to think on how to add tests; I suppose just adding some that use 2 jobs and making sure it works should be enough
How to test the behavior?
Checklist
ruff
from the source directory.