pylhc / tfs

Python package to handle TFS files
https://pylhc.github.io/tfs/
MIT License
9 stars 4 forks source link

[Feature Request]: add option to skip lines and read only header #115

Closed rdemaria closed 1 year ago

rdemaria commented 1 year ago

Feature Description

Sometimes it is useful to read only part of a long tfs table such as the header or few lines. It would be useful to add these option in the API.

Possible Implementation

No response

fsoubelet commented 1 year ago

Hi Riccardo,

I guess we could implement this, but that would introduce a new return type in the reader, which I'm not fond of.

Does reading your file without validation step for speed and getting the headers take too much time? Reading the FCC file you gave as example in #114 only takes 400ms on my machine, for instance. Something like:

result = tfs.read("path/to/file.tfs", validate=False).headers
rdemaria commented 1 year ago

I would not change the API, just to return a dataframe with less or no rows. Issue is not only the speed but sometimes memory too.

fsoubelet commented 1 year ago

I would not change the API, just to return a dataframe with less or no rows. Issue is not only the speed but sometimes memory too.

Indeed, we could consider this. I will look into it but this has pretty low priority as I'm still writing :)

rdemaria commented 1 year ago

what if we pass nrows to read_csv ?

fsoubelet commented 1 year ago

In the end there will most likely be a new function to read only the headers of the file. This stays clear and makes the APIs as lightweight as possible, as we don't want to pile arguments in read_tfs.