vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.26k stars 590 forks source link

[BUG-REPORT] Using Vaex in Readthedocs examples #1708

Closed erwanp closed 2 years ago

erwanp commented 2 years ago

Description

Hello ! We're deploying Vaex as the official HDF5-library in our code, and it gives great results !

We encountered one problem building the documentation. The documentation is built on ReadTheDocs and examples are run on each built.

In the build, vaex fails :

  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/checkouts/develop/examples/plot_hitemp_OH_database.py", line 16, in <module>
    df = fetch_hitemp("OH")
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/radis/io/hitemp.py", line 577, in fetch_hitemp
    ldb.download_and_parse(download_urls, download_files)
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/radis/io/dbmanager.py", line 395, in download_and_parse
    download_and_parse_one_file(urlname, local_file, Ndownload)
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/radis/io/dbmanager.py", line 360, in download_and_parse_one_file
    Nlines = self.parse_to_local_file(
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/radis/io/hitemp.py", line 345, in parse_to_local_file
    writer.write(local_file, df, append=True)
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/radis/io/hdf5.py", line 168, in write
    vaex.from_pandas(df).export_hdf5(file, group=key, mode="w")
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/dataframe.py", line 6544, in export_hdf5
    writer.layout(self)
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/hdf5/writer.py", line 54, in layout
    str_byte_length = {name:df[name].str.byte_length().sum(delay=True) for name, dtype in dtypes.items() if dtype.is_string}
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/hdf5/writer.py", line 54, in <dictcomp>
    str_byte_length = {name:df[name].str.byte_length().sum(delay=True) for name, dtype in dtypes.items() if dtype.is_string}
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/expression.py", line 845, in sum
    dtype = self.dtype
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/expression.py", line 480, in dtype
    return self.df.data_type(self)
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/dataframe.py", line 2049, in data_type
    data = self.evaluate(expression, 0, 1, filtered=True, array_type=array_type, parallel=False)
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/dataframe.py", line 2895, in evaluate
    return self._evaluate_implementation(expression, i1=i1, i2=i2, out=out, selection=selection, filtered=filtered, array_type=array_type, parallel=parallel, chunk_size=chunk_size)
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/dataframe.py", line 6155, in _evaluate_implementation
    value = scope.evaluate(expression)
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/scopes.py", line 112, in evaluate
    result = eval(expression, expression_namespace, self)
  File "<string>", line 1, in <module>
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/arrow/numpy_dispatch.py", line 136, in wrapper
    result = f(*args, **kwargs)
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/functions.py", line 47, in decorated
    return f(x, *args, **kwargs)
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/functions.py", line 1476, in str_byte_length
    return _to_string_sequence(x).byte_length()
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/column.py", line 510, in _to_string_sequence
    return convert.column_from_arrow_array(x).string_sequence
  File "/home/docs/checkouts/readthedocs.org/user_builds/radis/envs/develop/lib/python3.8/site-packages/vaex/column.py", line 629, in string_sequence
    string_type = vaex.strings.StringList32
AttributeError: module 'vaex' has no attribute 'strings'

This is probably due to an environment error. What is the best way to package vaex to be built and execute on RTD ?

Software information

Vaex 4.5.0

We currently do not use an Anaconda environment on RTD. Vaex is simply listed in setup.py install_requires file; and installed ad-hoc from there using whatever virtual environnment RTD uses.

An alternative would be to directly use an Anaconda environment on RTD (Vaex runs properly on our Travis-CI tests, deployed in an Anaconda environment), but there may be other options.

maartenbreddels commented 2 years ago

Hi Erwan,

thanks for the feedback, great you like Vaex. I actually recently also had the same issue. This will be fixed in https://github.com/vaexio/vaex/pull/1716 (basically, we avoid building vaex on rtd, but were a bit too aggressive).

cheers,

Maarten