Open lcrmorin opened 1 year ago
If we added a Series.to_parquet, I think users would expect to be able to round trip back to Series. I'm not sure but I don't think that's possible.
I personally use ser.to_frame(name).to_parquet(...)
.
cc @jorisvandenbossche
If we added a Series.to_parquet, I think users would expect to be able to round trip back to Series.
We have other IO methods on Series that doesn't necessarily give you that guarantee. For example, when reading the result of Series.to_csv
with pd.read_csv
, you will also get a DataFrame, I think.
So from that point of view, I would personally be fine with such a non-perfect roundtripping behaviour for Series.to_parquet
as well.
The question is if we want to add all of our IO methods to Series as well in general, or not (given that the workaround is quite easy). It seems we are now a bit inconsistent.
take
Assigning this to myself as it seems like a good first issue for me given I use pandas with parquet files regularly. Seems like there's still some ongoing discussion around the appropriateness of this so Ill keep an eye out if people decide this is no longer needed
@jorisvandenbossche this will need much more testing but I got it working locally and I wanted to get some initial validation on the idea https://github.com/pandas-dev/pandas/pull/54675/files
Alternatively we could do what Series.to_markdown() does here and simply cast the series to a frame and use the frames methods. I figured this wasn't as clean / easy to write unit tests for. Let me know if I have the right idea above whenever you have a chance. Thanks!
We have other IO methods on Series that doesn't necessarily give you that guarantee. For example, when reading the result of
Series.to_csv
withpd.read_csv
, you will also get a DataFrame, I think.So from that point of view, I would personally be fine with such a non-perfect roundtripping behaviour for
Series.to_parquet
as well.
I expect a lot more out of parquet than I do CSV/JSON/Excel, in particular round tripping with dtypes. I'm not so convinced that a comparison to CSV is warranted.
Do all IO methods rountrip back as a DataFrame? If that's the case, then I don't think it's worth the maintenance burden to have these methods on Series when they are just a .to_frame()
call away. But if there is good reason to keep some of them, then I can see the value that having them all on Series bring for a consistent API.
Feature Type
[X] Adding new functionality to pandas
[ ] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas
Problem Description
Currently the .to_parquet() method only work for dataframe. It would be nicer if the method could work on Series to. Currently we either have to save the series to another format or involve a pd.DataFrame(Serie) which seems a bit clunky.
Feature Description
For a given pandas Serie, being able to write Serie.to_parquet()
Alternative Solutions
Currently the two alternatives are:
Additional Context
No response