jamie-r-davis / ets_pull

1 stars 2 forks source link

"File name too long" message when using pd.read_fwf #1

Closed stevensondev closed 3 years ago

stevensondev commented 3 years ago

I've been scratching my head over this one. I'm trying to grab TOEFL data. I am able to call the following line:

df = pd.read_fwf(toefl_data, widths=layout.width, names=layout.field_name)

When doing so, I get this message:

Traceback (most recent call last): File "get_toefl.py", line 28, in df = pd.read_fwf(toefl_data, widths=layout.width, names=layout.field_name) File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 848, in read_fwf return _read(filepath_or_buffer, kwds) File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 1191, in _make_engine self._engine = klass(self.f, self.options) File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 3784, in init PythonParser.init(self, f, **kwds) File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 2389, in init memory_map=self.memory_map, File "/usr/local/lib64/python3.6/site-packages/pandas/io/common.py", line 496, in get_handle f = open(path_or_buf, mode, errors="replace", newline="") OSError: [Errno 36] File name too long: '

Appended to that appears the data dump of the fixed width records, which I'm omitting for privacy purposes. I never get a CSV. It's appears to not create the data frame at all.

Any idea what's going on?

jamie-r-davis commented 3 years ago

Yeah, sorry about that. The readme left a gap between pulling the data and then parsing it into the dataframe. The read_fwf method expects either a file path, path object, or file-like object instead of just the string of raw data. I updated the instructions to show how you could manage that.

For your second issue, I have moved away from a position where I have access to an ETS feed, so I cannot help you much there. The data that comes from ETS was (still is?) in a fixed-width format. The layouts provided contain example specs for the fixed-width format and the readme shows how you would go about converting the fixed-width format into a DataFrame, which can be exported to a csv format.