Open raayu83 opened 4 hours ago
It may be easier to see the difference with .to_dict()
instead of csv.
my_json = StringIO('[{ "mylist": ["a", "b", "c"] }]')
pl.read_json(my_json).to_pandas().to_dict()
# {'mylist': {0: array(['a', 'b', 'c'], dtype=object)}}
my_json = StringIO('[{ "mylist": ["a", "b", "c"] }]')
pd.read_json(my_json).to_dict()
# {'mylist': {0: ['a', 'b', 'c']}}
The difference seems to be that you get a numpy array from .to_pandas()
Checks
Reproducible example
result: ´´´ b",mylist\r\n0,['a' 'b' 'c']\r\n" b',mylist\r\n0,"[\'a\', \'b\', \'c\']"\r\n' ´´´
Log output
No response
Issue description
This issues can lead to errors if you switch from pandas to polars when you pass data somewhere else using df.to_polars(). df.to_polars() is in a different format than if the df was created by pandas initially. When calling df.to_csv(), you can see that the comma separating the elements of the list is missing in the polars version.
I'm not 100% sure whether this is a bug or intentional by design. But switching from pandas to polars would be easier if the output after to_pandas was the same as the original pandas df.
In my case, I had do write some additional logic to replicate the behavior of pandas.
Expected behavior
df.to_polars() delivers a df identical to what would be produced if you would use pandas all along
Installed versions