vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.23k stars 590 forks source link

[BUG-REPORT] ValueError: Cannot find newline in first 3145728 bytes of file #2302

Open SteveBox0 opened 1 year ago

SteveBox0 commented 1 year ago
ERROR    error opening './1.csv'                                                                                                                    __init__.py:271
                             Traceback (most recent call last):
                               File "/usr/local/lib/python3.10/dist-packages/vaex/__init__.py", line 236, in open
                                 vaex.convert.convert(
                               File "/usr/local/lib/python3.10/dist-packages/vaex/convert.py", line 38, in convert
                                 cached_output(*args, **kwargs)
                               File "/usr/local/lib/python3.10/dist-packages/vaex/cache.py", line 427, in call
                                 value = callable(*args, **kwargs)
                               File "/usr/local/lib/python3.10/dist-packages/vaex/convert.py", line 34, in cached_output
                                 ds = vaex.dataset.open(path_input, fs_options=fs_options_input, fs=fs_input, *args, **kwargs)
                               File "/usr/local/lib/python3.10/dist-packages/vaex/dataset.py", line 81, in open
                                 return opener.open(path, fs_options=fs_options, fs=fs, *args, **kwargs)
                               File "/usr/local/lib/python3.10/dist-packages/vaex/dataset.py", line 1457, in open
                                 return cls(path, *args, **kwargs)
                               File "/usr/local/lib/python3.10/dist-packages/vaex/csv.py", line 155, in __init__
                                 self._infer_schema()
                               File "/usr/local/lib/python3.10/dist-packages/vaex/csv.py", line 237, in _infer_schema
                                 raise ValueError("Cannot find newline in first %d bytes of file: %s" % (self.newline_readahead*3, self.path))
                             ValueError: Cannot find newline in first 3145728 bytes of file: ./1.csv
Traceback (most recent call last):
  File "/root/test/test-csv.py", line 10, in <module>
    df = vaex.open('./1.csv', convert='./my_big_file.hdf5')
  File "/usr/local/lib/python3.10/dist-packages/vaex/__init__.py", line 236, in open
    vaex.convert.convert(
  File "/usr/local/lib/python3.10/dist-packages/vaex/convert.py", line 38, in convert
    cached_output(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vaex/cache.py", line 427, in call
    value = callable(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vaex/convert.py", line 34, in cached_output
    ds = vaex.dataset.open(path_input, fs_options=fs_options_input, fs=fs_input, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vaex/dataset.py", line 81, in open
    return opener.open(path, fs_options=fs_options, fs=fs, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vaex/dataset.py", line 1457, in open
    return cls(path, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vaex/csv.py", line 155, in __init__
    self._infer_schema()
  File "/usr/local/lib/python3.10/dist-packages/vaex/csv.py", line 237, in _infer_schema
    raise ValueError("Cannot find newline in first %d bytes of file: %s" % (self.newline_readahead*3, self.path))
ValueError: Cannot find newline in first 3145728 bytes of file: ./1.csv
NickCrews commented 1 year ago

This is almost definitely a problem with your CSV file and not vaex: just as it says, there is no newline character in the first 3145728 bytes of the file.

Upload your csv file and paste a reproducible example if you want people to be able to help more.