opennem / nemweb

python package to directly download and process AEMO files from http://www.nemweb.com.au/
MIT License
32 stars 19 forks source link

Significant (10x +) Speedups Using extractall Rather than Extract_Stream #31

Open zthatch opened 2 years ago

zthatch commented 2 years ago

This implementation decodes the zip file line by line rather than extracting it in its entirety and then reading it as a csv. This is a significant bottleneck in the code and can be improved by using extractall to extract the zip file to a temporary directory, and then use the csv reader to iterate through the rows, which then requires changes to the line processing the use of stringIO (rather than bytesIO) to load the table into a dataframe.

jufemaiz commented 2 years ago

Hard agree with this!

https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.extractall

Perhaps use of tmp directory for extraction instead of default current working directory in order to manage cleanup more effectively?