I did a test, and I'd say that the index is kept with both engines, pyarrow and fastparquet. I guess that was a past behavior, and it wasn't updated. I'd say that the best pandas can do now is to imply have index=True as default.
I think a PR should be simple enough to propose the change, and have the discussion directly in the PR (as opposed to open an issue to discuss). The final solution can end up being a different one, but starting with a proposal can make the discussions easier and more focused.
In the description of the PR, would be useful to have a very simple example that shows how the index is saved in both cases.
In the documentation of
to_parquet
(https://dev.pandas.io/reference/api/pandas.DataFrame.to_parquet.html#pandas.DataFrame.to_parquet), for the index parameter, it says that when the value isNone
, the behavior depends on the engine.I did a test, and I'd say that the index is kept with both engines,
pyarrow
andfastparquet
. I guess that was a past behavior, and it wasn't updated. I'd say that the best pandas can do now is to imply haveindex=True
as default.I think a PR should be simple enough to propose the change, and have the discussion directly in the PR (as opposed to open an issue to discuss). The final solution can end up being a different one, but starting with a proposal can make the discussions easier and more focused.
In the description of the PR, would be useful to have a very simple example that shows how the index is saved in both cases.