tomerfiliba / plumbum

Plumbum: Shell Combinators
https://plumbum.readthedocs.io
MIT License
2.78k stars 182 forks source link

plumbum's path objects look like files but behave differently #651

Open gnurbs opened 1 year ago

gnurbs commented 1 year ago

I'm using plumbum to read a remote file. The file is consumed by another library, pandas. Like This:

import pandas, plumbum

remote = plumbum.SshMachine('myremote')
fd = remote.path('/tmp/test.csv')
data = pandads.read_csv(fd)

I expected this to work since 'read' in dir(fd). However it fails in a bad way (see bottom for the backtrace, but I don't think it's necessary) that made me think the problem is on pandas' side - but now I think it's not, as per fd.read's help:

fd.read?
Signature: dataFd.read(encoding=None)
Docstring:
returns the contents of this file as a ``str``. By default the data is read
as text, but you can specify the encoding, e.g., ``'latin1'`` or ``'utf8'``
File:      /usr/lib/python3/dist-packages/plumbum/path/remote.py

it's different from usual file-like objects in that read doesn't take the max amount of bytes to read, but the encoding.

I understand plumbum doesn't necessarily try to be pythonic, but it creates confusion if strongly-standing notions such as file-like objects don't work as one would expect them to - this needs a warning. Or maybe an API change.

Traceback of my actual code ``` File "plot.py", line 31, in data = pandas.read_csv(dataFd) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 211, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 331, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/pandas/io/parsers/readers.py", line 950, in read_csv return _read(filepath_or_buffer, kwds) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/pandas/io/parsers/readers.py", line 605, in _read parser = TextFileReader(filepath_or_buffer, **kwds) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/pandas/io/parsers/readers.py", line 1442, in __init__ self._engine = self._make_engine(f, self.engine) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine self.handles = get_handle( ^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/pandas/io/common.py", line 856, in get_handle handle = open( ^^^^^ FileNotFoundError: [Errno 2] No such file or directory: ``` this lead me on the path that pandas uses `repr(filepath_or_buffer)` (that's fd in my example), but I think it goes on to that codepath after plumbum's `fd.read` behaves differently than expected. It's possible this particular problem only appears with remote paths or files from a certain size, but I'm pretty sure the root cause is that plumbum doesn't behave in the standard way - which I think for **plumbum in particular** is fine, as it tries to do something that doesn't particularly look pythonish, but it should be easier to understand what fails when one reasonably assumes it works differently