python-sprints / pandas-mentoring

Mentoring new pandas contributors.
BSD 3-Clause "New" or "Revised" License
6 stars 30 forks source link

Update index parameter in pandas to_parquet #156

Closed datapythonista closed 5 years ago

datapythonista commented 5 years ago

In the documentation of to_parquet (https://dev.pandas.io/reference/api/pandas.DataFrame.to_parquet.html#pandas.DataFrame.to_parquet), for the index parameter, it says that when the value is None, the behavior depends on the engine.

I did a test, and I'd say that the index is kept with both engines, pyarrow and fastparquet. I guess that was a past behavior, and it wasn't updated. I'd say that the best pandas can do now is to imply have index=True as default.

I think a PR should be simple enough to propose the change, and have the discussion directly in the PR (as opposed to open an issue to discuss). The final solution can end up being a different one, but starting with a proposal can make the discussions easier and more focused.

In the description of the PR, would be useful to have a very simple example that shows how the index is saved in both cases.

galuhsahid commented 5 years ago

I'd like to work on this, will open a PR at the pandas-dev repo later today