pylhc / tfs

Python package to handle TFS files
https://pylhc.github.io/tfs/
MIT License
9 stars 4 forks source link

[Fix]: Properly read empty strings and allow reading headers only #116

Closed fsoubelet closed 1 year ago

fsoubelet commented 1 year ago

This PR adresses #114 and #115.

For the first one, as pandas naturally infers empty strings ("") to NaN when reading, a step is inserted to convert back NaN values in string or object-type columns into empty strings.

For the latter, a new function read_headers is added to tfs.reader, which does exactly that.

Incidentally, a little internal rework of the reader was done. A new function and dataclass were added that take care of reading metadata of the file: everything but the dataframe part (headers, number of non data lines, column names and types). It is used in the read_tfs and read_header functions (this was mostly a block export from read_tfs to the helper _read_metadata). Thanks to @JoschD for suggesting this.

Tests were added. Version is bumped to 3.5.0.

fsoubelet commented 1 year ago

@rdemaria how would you feel about using a dedicated function to access the headers (only) of the file? This would be used as:

from tfs.reader import read_headers

headers = read_headers(your_file.tfs)
rdemaria commented 1 year ago

ok! I bit more cumbersome than adding an argument, but I noticed that you forbid to have many arguments, so I don't argue...