Closed ddelange closed 3 weeks ago
@maartenbreddels, great to see you're working on this again! Considering:
Can we make this more manageable by splitting it into multiple smaller PRs?
Do you find these two PRs still useful, or shall I close them?
With https://github.com/vaexio/vaex/pull/2417 and https://github.com/vaexio/vaex/pull/2414 I started with two small steps.
Given the work required, i think we should get this into this PR, and get this as green as possible. Shall I cherry-pick it onto this PR?
Yes, cherry-picking is perfectly fine!
I'm a bit lost now in solving a merge conflict in.github/workflows/wheel.yml
In your PR @EwoutH we remove 36 and 37, but in this PR I I don't see how we choose those.
looks like micromamba got stuck on 2/6 jobs, maybe add a timeout-minutes
?
I thought it would be a good idea to start building the binary wheels to make sure this all runs.
@ddelange could you take a quick look why it fails? I'll think about the micromamba issue
Exciting to see some green :)
readthedocs is green again and looks good
mamba is stuck again
readthedocs is green again and looks good
😍
@maartenbreddels can you cancel some of the older Actions runs here, to free a slot for the latest commit?
Maybe include
concurrency:
group: ${{ github.ref }}
cancel-in-progress: true
like in pythonpackage.yaml ?
Exciting to see all those ✅s!
One thing i'd like to see, but I'm not sure where this would go. Is to try to do a simple import vaex; vaex.example()
for the build wheels, so we know the wheels are valid.
Not sure if this is easy to do, I'm just afraid that after this many changes we might see a broken release.
@ddelange I believe you've been testing with this branch, so if you are confident that is not the case, let me know and we can skip this step.
I think we should also expand the test matrix, because although we make the wheels for 3.12, we did not run the test suite.
were the musllinux wheels ever tested on an alpine box? musllinux smoketest is currently failing due to a missing pyarrow wheel: https://github.com/apache/arrow/pull/40177
see also last three commits. i would suggest either skipping these platform smoketests, or skipping musllinux wheels. building pyarrow from source is a bit of a pain i think. any thoughts?
let's skip it (but maybe add comments on how to enable it when possible in the future?)
PS: i'm force pushing a lot to find the commit that broke micromamba.
re: flaky micromamba
I could replace setup-micromamba with setup-python:
you often see packages go
pip install pandas[dev]
. we could move this stuff intovaex[dev]
and deprecate this file? and github actions goespip install -e .[dev]
?on a side-note: shouldn't the
lightgbm>=4.0.0
constraint go into vaex-ml pyproject.toml anyway?
I've been advancing through the bugs on CI for a while in a separate branch, but it looks like we're not out of the woods yet https://github.com/ddelange/vaex/pull/3
Just a few more errors 🥳 . The first seems to be pandas errors (help welcome).
The graphql seems impossible to debug. I suggest we deprecate the package/maintainance, until someone has an idea how to approach it.
FAILED tests/column_test.py::test_dtype_object_with_arrays - ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
FAILED tests/datetime_test.py::test_datetime_operations - AttributeError: 'IntegerArray' object has no attribute 'type'
FAILED tests/export_test.py::test_multi_file_naive_read_convert_export[dtypes1] - ValueError: Integer column has NA values in column 1
FAILED tests/export_test.py::test_export_generates_same_hdf5_shasum[dtypes1] - ValueError: Integer column has NA values in column 1
FAILED tests/graphql_test.py::test_aggregates - assert not [GraphQLError("__init__() got an unexpected keyword argument 'df'", locations=[SourceLocation(line=3, column=9)], path=['df'])]
+ where [GraphQLError("__init__() got an unexpected keyword argument 'df'", locations=[SourceLocation(line=3, column=9)], path=['df'])] = ExecutionResult(data={'df': None}, errors=[GraphQLError("__init__() got an unexpected keyword argument 'df'", locations=[SourceLocation(line=3, column=9)], path=['df'])]).errors
FAILED tests/graphql_test.py::test_groupby - assert not [GraphQLError("__init__() got an unexpected keyword argument 'df'", locations=[SourceLocation(line=3, column=9)], path=['df'])]
+ where [GraphQLError("__init__() got an unexpected keyword argument 'df'", locations=[SourceLocation(line=3, column=9)], path=['df'])] = ExecutionResult(data={'df': None}, errors=[GraphQLError("__init__() got an unexpected keyword argument 'df'", locations=[SourceLocation(line=3, column=9)], path=['df'])]).errors
FAILED tests/graphql_test.py::test_row_pagination - assert not [GraphQLError("__init__() got an unexpected keyword argument 'df'", locations=[SourceLocation(line=3, column=9)], path=['df'])]
+ where [GraphQLError("__init__() got an unexpected keyword argument 'df'", locations=[SourceLocation(line=3, column=9)], path=['df'])] = ExecutionResult(data={'df': None}, errors=[GraphQLError("__init__() got an unexpected keyword argument 'df'", locations=[SourceLocation(line=3, column=9)], path=['df'])]).errors
FAILED tests/graphql_test.py::test_where - assert not [GraphQLError("__init__() got an unexpected keyword argument 'df'", locations=[SourceLocation(line=3, column=9)], path=['df'])]
+ where [GraphQLError("__init__() got an unexpected keyword argument 'df'", locations=[SourceLocation(line=3, column=9)], path=['df'])] = ExecutionResult(data={'df': None}, errors=[GraphQLError("__init__() got an unexpected keyword argument 'df'", locations=[SourceLocation(line=3, column=9)], path=['df'])]).errors
ubuntu cp312:
tensorflow>=2.1.0,<3.dev0, werkzeug<1.0.1 are incompatible.
And because tensorboard>=2.17.0 depends on werkzeug>=1.0.1 and
graphene-tornado>=2.5.1,<=2.6.1 depends on werkzeug==0.12.2, we can
conclude that all of:
graphene-tornado==2.5.1
graphene-tornado==2.6.0
graphene-tornado==2.6.1
and tensorflow>=2.1.0,<3.dev0 are incompatible.
And because only the following versions of graphene-tornado are
available:
graphene-tornado<=2.5.1
graphene-tornado==2.6.0
graphene-tornado==2.6.1
graphene-tornado>3
and vaex-graphql==0.2.0 depends on graphene-tornado>=2.5.1,<3, we can
conclude that tensorflow>=2.1.0,<3.dev0 and vaex-graphql==0.2.0 are
incompatible.
cp312 needs tensorflow>=2.16.0 based on wheels on PyPI
in graphene-tornado v3.0.0b2 the werkzeug pin is removed: https://github.com/graphql-python/graphene-tornado/commit/a63677c6f463bd082186b77ece450a222e0b7f46
I think we don't really care about this dependency, and since uv can force/override versions, I'm thinking about using uv pip install --override requirements-override.txt -e .
with
# requirements-override.txt
Werkzeug>1.0.1
Nice escape hatch!
all users that want to install vaex-graphql will need to use the escape hatch in this case
only when combined with vaex-ml, and it's not a standard dep of vaex (meta package)
macos doesn't seem to run
go team <3
1 failed on macos:
=========================== short test summary info ============================
FAILED tests/percentile_approx_test.py::test_percentile_approx - AssertionError:
Arrays are not almost equal to 1 decimals
x and y nan location mismatch:
x: array([-7.8e+01, nan, -5.1e-02, 3.5e+00, 1.3e+02])
y: array([-7.8e+01, -3.6e+00, -3.7e-02, 3.5e+00, 1.3e+02])
= 1 failed, 3080 passed, 323 skipped, 19 xfailed, 7 xpassed, 282243 warnings in 293.01s (0:04:53) =
Ah yes, macos on 3.12 runs, but the others fails with:
/Library/Frameworks/Python.framework/Versions/3.12/bin/python: No module named pytest
1 test crashes on windows (doesn't look good). I have access to a windows machine, so I could try to debug myself. I'm a bit surprised by it though.
osx build failures I don't understand. @ddelange do you maybe have an idea which change introduces that?
I also managed to build for emscripten/pyodide... but I'll open a PR for that AFTER we get this in :-D
mac arm64 wheel build was working before. maybe github upgraded clang in the macos-14 runner :thinking:
In file included from vendor/boost/boost/mpl/integral_c.hpp:32:
vendor/boost/boost/mpl/aux_/integral_wrapper.hpp:73:31: error: integer value -1 is outside the valid range of values [0, 3] for this enumeration type [-Wenum-constexpr-conversion]
typedef AUX_WRAPPER_INST( BOOST_MPL_AUX_STATIC_CAST(AUX_WRAPPER_VALUE_TYPE, (value - 1)) ) prior;
^
hmm building arm64 on an x86 runner is also not a good idea:
ld: warning: ignoring file '/Users/runner/work/vaex/vaex/pcre-8.38/.libs/libpcrecpp.0.dylib-master.o': found architecture 'x86_64', required architecture 'arm64'
ld: warning: undefined base symbol '__ZN7pcrecpp2RE6no_argE' for alias '__ZN7pcrecpp6no_argE'
here it was still successful: https://github.com/vaexio/vaex/actions/runs/9793737759/job/27045974297
CI is green 🎉 but readthedocs still fails:
The conflict is caused by:
graphene-tornado 2.5.1 depends on werkzeug==0.12.2
tensorboard 2.16.2 depends on werkzeug>=1.0.1
graphene-tornado 2.5.1 depends on werkzeug==0.12.2
tensorboard 2.16.1 depends on werkzeug>=1.0.1
graphene-tornado 2.5.1 depends on werkzeug==0.12.2
tensorboard 2.16.0 depends on werkzeug>=1.0.1
a clean way to use uv in readthedocs is proposed: https://github.com/readthedocs/readthedocs.org/issues/11551
so no easy way now to use the uv escape hatch
@maartenbreddels fixed without using the escape hatch in 7ed1b54 (see description)
we can also add the PEP440 direct reference in vaex-graphql instead of in the top level ci
extra if you like. turns it into an 'exact pin' of graphql-tornado 2.6.1 with unpinned werkzeug.
ouh random segfault on windows-latest 3.8. gotta love windows
./ci/04-run-test-suite.sh: line 14: 654 Segmentation fault python -m pytest --pyargs --doctest-modules --timeout=1000 tests vaex.datatype_test vaex.file vaex.test.dataset::TestDataset vaex.datatype vaex.utils vaex.struct vaex.groupby
tests\agg_test.py ....................s
Love the fix! I was afraid this didn't work, because you cannot release with references to files or URLs. but the top level setup.py does not see pypi, so it should be good.
I'm installing vaex on a windows machine to see if I can debug that segfault.
I think 91263bc makes it impossible to release on pypi, they will refuse an upload.
ahh of course!
I ran Windows + Python 3.8 100s of times, and I cannot reproduce the failure. I'm leaning to either:
Ideas?
looks like not allowed. have a feeling it might be windows 3.8 threading implementation memory leak
you could simply skip publish of that vaex-core wheel, and don't upload tar.gz for vaex-core. it's the tensorflow approach
done in the commits below
that did it. we're green :checkered_flag:
fwiw, pandas deprecated python 3.8 in August 2023 (v2.1.0)
Whow, amazing. You did most of the work @ddelange thanks a lot for the work and support !
I'll set up the trusted publisher (probably next week) and see if we can make a release 🥳
Wow it got merged! Congratulations!!
Well done @ddelange @maartenbreddels @EwoutH ; that's a great news for our codes!
Great news! Thanks a lot. @ddelange @maartenbreddels @EwoutH
@Ben-Epstein wake up it's christmas
fyi followup PR https://github.com/vaexio/vaex/pull/2434 should land soon, so release is imminent:)
Hoi 👋
linux-aarch64 makes up for almost 10% of all platforms ref https://github.com/giampaolo/psutil/pull/2103
manylinux_2_28
).the wheels from this PR can be installed with:
fixes #2366, fixes #2368, fixes #2397, closes #2427, fixes #2384