rs-station / reciprocalspaceship

Tools for exploring reciprocal space
https://rs-station.github.io/reciprocalspaceship/
MIT License
28 stars 11 forks source link

Parallel Stream File Parsing with ray.io #260

Open kmdalton opened 2 weeks ago

kmdalton commented 2 weeks ago

This PR adds support for faster stream file parsing which is parallelized using the ray. I did not add ray as a dependency for users, so the code falls back to serial python when it is not available.

JBGreisman commented 2 weeks ago

This all looks good to me, but I'd like to play with it a bit more before merging. I think I agree with making ray an optional dependency, but I don't think I like adding it to tests_require -- seems like a hacky solution.

What are your thoughts on us adding an explicit parallel_require=["ray"], that is added to [dev] and maybe a new [parallel] pip option that only adds on the ray extra requirement?

kmdalton commented 2 weeks ago

I'm happy to defer to your preferences regarding requirements. I don't have strong feelings as long as we make it easy for users to figure out how to get parallelism. I am not yet familiar enough with ray to know how nicely it plays with other packages. So far it seems very promising.

kmdalton commented 2 weeks ago

@marinegor , would you be willing to test out this branch for us?

marinegor commented 2 weeks ago

@kmdalton sure, I can have a look! what kind of testing are you thinking, could you elaborate? I imagine you want to make sure that your parser produces same results as the previous one, right?

kmdalton commented 2 weeks ago

Thank you! I don't have access to a lot of stream files, and I have noticed there can be some differences in the metadata between files. Mostly I want to make sure I'm not missing anything which will break the parser for some edge cases. Additionally, I would hope you could let us know

DHekstra commented 1 week ago

$ conda install -c conda-forge "ray-default"

fails for me like this in a fresh conda environment in which I (tried and maybe failed) to use your careless install script:

Channels:
 - conda-forge
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed

LibMambaUnsatisfiableError: Encountered problems while solving:
  - nothing provides _python_rc needed by python-3.12.0rc3-rc3_hab00c5b_1_cpython

Could not solve for environment specs
The following packages are incompatible
├─ python 3.12**  is installable with the potential options
│  ├─ python [3.12.0|3.12.1|3.12.2|3.12.3|3.12.4], which can be installed;
│  ├─ python [3.12.0|3.12.1|3.12.2|3.12.3|3.12.4] would require
│  │  └─ python_abi 3.12.* *_cp312, which can be installed;
│  └─ python 3.12.0rc3 would require
│     └─ _python_rc, which does not exist (perhaps a missing channel);
└─ ray-default is not installable because there are no viable options
   ├─ ray-default [1.10.0|1.11.0|...|2.0.0] would require
   │  ├─ python >=3.7,<3.8.0a0 , which conflicts with any installable versions previously reported;
   │  └─ python_abi 3.7.* *_cp37m, which conflicts with any installable versions previously reported;
   ├─ ray-default [1.10.0|1.11.0|...|2.9.3] would require
   │  ├─ python >=3.8,<3.9.0a0 , which conflicts with any installable versions previously reported;
   │  └─ python_abi 3.8.* *_cp38, which conflicts with any installable versions previously reported;
   ├─ ray-default [1.10.0|1.11.0|...|2.9.3] would require
   │  ├─ python >=3.9,<3.10.0a0 , which conflicts with any installable versions previously reported;
   │  └─ python_abi 3.9.* *_cp39, which conflicts with any installable versions previously reported;
   ├─ ray-default [1.13.0|2.0.0|...|2.9.3] would require
   │  ├─ python >=3.10,<3.11.0a0 , which conflicts with any installable versions previously reported;
   │  └─ python_abi 3.10.* *_cp310, which conflicts with any installable versions previously reported;
   ├─ ray-default [1.5.0|1.5.1|1.5.2|1.6.0] would require
   │  ├─ python >=3.6,<3.7.0a0 , which conflicts with any installable versions previously reported;
   │  └─ python_abi 3.6.* *_cp36m, which conflicts with any installable versions previously reported;
   ├─ ray-default [2.10.0|2.11.0|...|2.9.3] would require
   │  ├─ python >=3.11,<3.12.0a0 , which conflicts with any installable versions previously reported;
   │  └─ python_abi 3.11.* *_cp311, which conflicts with any installable versions previously reported;
   ├─ ray-default 2.8.0 would require
   │  └─ ray-core 2.8.0 py38h1702d6c_1, which does not exist (perhaps a missing channel);
   ├─ ray-default [1.6.0|1.9.2|2.0.1] would require
   │  └─ python >=3.7,<3.8.0a0 , which conflicts with any installable versions previously reported;
   ├─ ray-default [1.6.0|1.9.2|2.0.1|2.3.0|2.6.3] would require
   │  └─ python >=3.8,<3.9.0a0 , which conflicts with any installable versions previously reported;
   ├─ ray-default [1.6.0|1.9.2|2.0.1|2.3.0|2.6.3] would require
   │  └─ python >=3.9,<3.10.0a0 , which conflicts with any installable versions previously reported;
   ├─ ray-default [2.0.1|2.3.0|2.6.3] would require
   │  └─ python >=3.10,<3.11.0a0 , which conflicts with any installable versions previously reported;
   └─ ray-default 2.6.3 would require
      └─ python >=3.11,<3.12.0a0 , which conflicts with any installable versions previously reported.
DHekstra commented 1 week ago

This also fails:

(careless-13)[dhekstra@holy8a24301 reciprocalspaceship]$ pip install -U "ray"
ERROR: Could not find a version that satisfies the requirement ray (from versions: none)
ERROR: No matching distribution found for ray

(careless-13)[dhekstra@holy8a24301 reciprocalspaceship]$ pip install -U "ray[default]"
ERROR: Could not find a version that satisfies the requirement ray[default] (from versions: none)
ERROR: No matching distribution found for ray[default]

(careless-13)[dhekstra@holy8a24301 reciprocalspaceship]$ pip install ray
ERROR: Could not find a version that satisfies the requirement ray (from versions: none)
ERROR: No matching distribution found for ray
kmdalton commented 1 week ago

@DHekstra , I think some of your packages (careless for sure) do not have python 3.12 support which is confusing the package solver. You should use python 3.11 for now.