While working with EPA data (which has a 1.8GB FACILITY file and a 960MB ORGANIZATION file), I discovered the downside of using io.BytesIO: it uses a lot of memory, which can be a challenge for 4GB notebooks. The workaround I discovered was to use a temporary file:
By not loading these big files directly into memory, the restricted versions of the table I need for processing fit easily, even without deleting the processed dataframe after loading it into Trino:
While working with EPA data (which has a 1.8GB FACILITY file and a 960MB ORGANIZATION file), I discovered the downside of using io.BytesIO: it uses a lot of memory, which can be a challenge for 4GB notebooks. The workaround I discovered was to use a temporary file:
By not loading these big files directly into memory, the restricted versions of the table I need for processing fit easily, even without deleting the processed dataframe after loading it into Trino:
I suggest we encourage using tempfiles and then deleting them: