Open Ben-Epstein opened 1 year ago
I can get around this by adding this to the top of the file
import vaex
import vaex.hdf5.dataset
import vaex.arrow.opener
vaex.dataset.opener_classes = [
vaex.hdf5.dataset.Hdf5MemoryMapped,
vaex.hdf5.dataset.AmuseHdf5MemoryMapped,
vaex.hdf5.dataset.Hdf5MemoryMappedGadget,
vaex.arrow.opener.ArrowOpener,
vaex.arrow.opener.FeatherOpener,
vaex.arrow.opener.ParquetOpener
]
vaex.open("file*.hdf5")
seems like an issue registering the classes here - looks like it's not finding any classes. I added debug logging and the line trying opener is never called
Interestingly enough, while that works ^ I get new errors when trying to actually use the dataframe
import vaex
import numpy as np
df = vaex.from_arrays(id=list(range(100_000)), emb=np.random.rand(100_000, 768))
df.export('file.hdf5')
df.export('file1.hdf5')
df = vaex.open("file*.hdf5")
df["id"].sum()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[7], line 9
6 df.export('file1.hdf5')
8 df = vaex.open("file*.hdf5")
----> 9 df["id"].sum()
File ~.venv/lib/python3.8/site-packages/vaex/expression.py:923, in Expression.sum(self, axis, binby, limits, shape, selection, delay, progress)
921 del kwargs['dtype']
922 kwargs['expression'] = expression.expression
--> 923 return self.ds.sum(**kwargs)
924 else:
925 return expression
File ~.venv/lib/python3.8/site-packages/vaex/dataframe.py:1130, in DataFrame.sum(self, expression, binby, limits, shape, selection, delay, progress, edges, array_type)
1107 @docsubst
1108 @stat_1d
1109 def sum(self, expression, binby=[], limits=None, shape=default_shape, selection=False, delay=False, progress=None, edges=False, array_type=None):
1110 """Calculate the sum for the given expression, possible on a grid defined by binby
1111
1112 Example:
(...)
1128 :return: {return_stat_scalar}
1129 """
-> 1130 return self._compute_agg('sum', expression, binby, limits, shape, selection, delay, edges, progress, array_type=array_type)
1131 @delayed
1132 def finish(*sums):
1133 return vaex.utils.unlistify(waslist, sums)
File ~venv/lib/python3.8/site-packages/vaex/dataframe.py:941, in DataFrame._compute_agg(self, name, expression, binby, limits, shape, selection, delay, edges, progress, extra_expressions, array_type)
939 stats = [compute(expression, binners, selection=selection, edges=edges) for expression in expressions]
940 var = finish(binners, *stats)
--> 941 return self._delay(delay, progressbar.exit_on(var))
File ~.venv/lib/python3.8/site-packages/vaex/dataframe.py:1780, in DataFrame._delay(self, delay, task, progressbar)
1778 return task
1779 else:
-> 1780 self.execute()
1781 return task.get()
File ~.venv/lib/python3.8/site-packages/vaex/dataframe.py:421, in DataFrame.execute(self)
419 print(repr(task))
420 if self.executor.tasks:
--> 421 self.executor.execute()
File ~.venv/lib/python3.8/site-packages/vaex/execution.py:308, in ExecutorLocal.execute(self)
307 def execute(self):
--> 308 for _ in self.execute_generator():
309 pass
File ~.venv/lib/python3.8/site-packages/vaex/execution.py:378, in ExecutorLocal.execute_generator(self, use_async)
376 run.nthreads = nthreads = self.thread_pool.nthreads
377 task_checkers = vaex.tasks.create_checkers()
--> 378 memory_tracker = vaex.memory.create_tracker()
379 vaex.memory.local.agg = memory_tracker
380 # we track this for consistency
File ~.venv/lib/python3.8/site-packages/vaex/memory.py:37, in create_tracker()
35 if cls is not None:
36 return cls()
---> 37 raise ValueError(f"No memory tracker found with name {memory_tracker_type}")
ValueError: No memory tracker found with name default
@maartenbreddels any idea whats going on here and if it's a quick fix?
I think this is probably related to https://github.com/vaexio/vaex/issues/2282
Reproducing the error it seems to me related to pip install vaex
vs pip install vaex-core
or some other packages
For reproduction https://colab.research.google.com/drive/1EG9898VtmO19FwfZKd_LJzqkz_YVWwlE?usp=sharing
pip freeze diff check shows the packages that additionally get installed:
aplus==0.11.0
blake3==0.3.1
commonmark==0.9.1
frozendict==2.3.4
nest-asyncio==1.5.6
rich==12.6.0
Found the issue:
in file dataset.py
line 57
for entry in pkg_resources.iter_entry_points(group='vaex.dataset.opener')
is empty except if you restart the kernel
here is a fix: https://git.smhi.se/climix/climix/-/merge_requests/165/diffs
@Ben-Epstein is that a bug for a specific version of the vaex-hdf5 package? what happens if you update to 0.13 or 0.14?
@JovanVeljanoski i think it's because of the legacy importlib, because it happens for arrow as well. I think @franz101 fix is the correct one https://github.com/vaexio/vaex/pull/2293
Pinging @maartenbreddels since he was working on something very similar recently.
@JovanVeljanoski @maartenbreddels that issue was 100% related to pkg_resources not finding the lazy loaded readers. with importlib it works now. issue can be closed for now.
although in the code it would make sense that the length of: pkg_resources.iter_entry_points is more than 0.
Thank you for reaching out and helping us improve Vaex!
Before you submit a new Issue, please read through the documentation. Also, make sure you search through the Open and Closed Issues - your problem may already be discussed or addressed.
Description
On a fresh notebook without vaex installed, run the following
You see the following
Software information
import vaex; vaex.__version__)
: {'vaex-core': '4.15.0', 'vaex-hdf5': '0.12.3'}Additional information Please state any supplementary information or provide additional context for the problem (e.g. screenshots, data, etc..).