pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.82k stars 17.99k forks source link

Unexpected behaviour of df.to_json(compression="gzip") #32326

Open MartinThoma opened 4 years ago

MartinThoma commented 4 years ago

Code Sample

import pandas as pd

df = pd.DataFrame({"a": [1, 2, 3, 4], "b": ["A", "B", "C", "D"],})
json_df = df.to_json(lines=True, orient="records", compression="gzip")
print(json_df)

Problem description

The behavior of the "compression" argument is unexpected. It doesn't have any effect. Although it is documented that it is only applied if there is a filename argument, there should at very least be a warning if compression is set, but no filename is given.

Also: Why doesn't it have an effect? Couldn't we output a bytes object?

R4HMATT commented 4 years ago

I can work on adding a warning to this for now. Not sure if the pandas team wants us to output a bytes object though.