Closed pedrohesch closed 3 years ago
Hi Pedro,
thanks for opening the issue. Can you share with me:
pip show s3fs
?
Also, could you check if you have enough disk space, and you could try removing/moving ~/.vaex/file-cache/s3/
to see if that helps. If you move it, you can later restore it to reproduce the error, to help us track down the error.
Regards,
Maarten
pip show s3fs:
Name: s3fs Version: 0.2.2 Summary: Convenient Filesystem interface over S3 Home-page: http://github.com/dask/s3fs/ Author: None Author-email: None License: BSD Location: c:\users\pedro\anaconda3\lib\site-packages Requires: boto3, six, botocore Required-by: vaex-hdf5
I removed everything from ~/.vaex/file-cache/s3/ . Then I got 16GB of free space in the disk. The file has 10GB. But I am still receiveing the same error.
Running the same version here. Could you contact me privately at maartenbreddels@gmail.com maybe we can find a way to give me access to this file.
Closing as stale. Please re-open if needed.
I am trying vaex open as follows: df2 = vaex.open('s3://viacao-sampaio/HDF5/master_df.hdf5?profile_name=pedroAI')
but I am receveing the following errors:
before I copy the error message here, I would like to make 3 notes: 1- When I vaex.open the same file from local computer, is OK. 2- When I vaex.open a small file with the same line of code, is OK. 3- This file , master_df.hdf5, is a 10GB with more than 40 millions lines.
ERROR:MainThread:vaex:error evaluating: CODIGO_DESTINO at rows 40800736-40800741 Traceback (most recent call last): File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\dataframe.py", line 3523, in table_part values[name] = df.evaluate(name) File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\dataframe.py", line 5120, in evaluate return self._evaluate_implementation(expression, i1=i1, i2=i2, out=out, selection=selection, filtered=filtered, internal=internal, parallel=parallel, chunk_size=chunk_size) File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\dataframe.py", line 5261, in _evaluate_implementation result = [finalize_result(k) for k in expressions] File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\dataframe.py", line 5261, in
result = [finalize_result(k) for k in expressions]
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\dataframe.py", line 5249, in finalize_result
values = to_numpy(chunks[0])
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\array_types.py", line 9, in to_numpy
x = x.to_numpy()
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\column.py", line 414, in to_numpy
return self.string_sequence.to_numpy()
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\column.py", line 370, in string_sequence
self._string_sequence = string_type(_asnumpy(self.bytes), _asnumpy(self.indices), self.length, self.offset, _asnumpy(self.null_bitmap), self.null_offset)
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\column.py", line 326, in _asnumpy
return ar.to_numpy()
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\file\column.py", line 73, in to_numpy
return self[0:self.length]
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\file\column.py", line 159, in getitem
ar = file._as_numpy(offset, byte_length, self.dtype)
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\file\cache.py", line 143, in _as_numpy
self._ensure_cached(offset, offset+byte_length)
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\file\cache.py", line 155, in _ensure_cached
self.file.seek(start_blocked)
File "C:\Users\pedro\Anaconda3\lib\site-packages\s3fs\core.py", line 1293, in seek
raise ValueError('Seek before start of file')
ValueError: Seek before start of file
ERROR:MainThread:vaex:error evaluating: CODIGO_DESTINO at rows 40800736-40800741
Traceback (most recent call last):
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\dataframe.py", line 3523, in table_part
values[name] = df.evaluate(name)
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\dataframe.py", line 5120, in evaluate
return self._evaluate_implementation(expression, i1=i1, i2=i2, out=out, selection=selection, filtered=filtered, internal=internal, parallel=parallel, chunk_size=chunk_size)
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\dataframe.py", line 5261, in _evaluate_implementation
result = [finalize_result(k) for k in expressions]
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\dataframe.py", line 5261, in
result = [finalize_result(k) for k in expressions]
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\dataframe.py", line 5249, in finalize_result
values = to_numpy(chunks[0])
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\array_types.py", line 9, in to_numpy
x = x.to_numpy()
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\column.py", line 414, in to_numpy
return self.string_sequence.to_numpy()
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\column.py", line 370, in string_sequence
self._string_sequence = string_type(_asnumpy(self.bytes), _asnumpy(self.indices), self.length, self.offset, _asnumpy(self.null_bitmap), self.null_offset)
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\column.py", line 326, in _asnumpy
return ar.to_numpy()
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\file\column.py", line 73, in to_numpy
return self[0:self.length]
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\file\column.py", line 159, in getitem
ar = file._as_numpy(offset, byte_length, self.dtype)
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\file\cache.py", line 143, in _as_numpy
self._ensure_cached(offset, offset+byte_length)
File "C:\Users\pedro\Anaconda3\lib\site-packages\vaex\file\cache.py", line 155, in _ensure_cached
self.file.seek(start_blocked)
File "C:\Users\pedro\Anaconda3\lib\site-packages\s3fs\core.py", line 1293, in seek
raise ValueError('Seek before start of file')
ValueError: Seek before start of file
How to fix it? thanks in advance