Open randolf-scholz opened 5 years ago
What precise changes to the output (or reader?) are you proposing?
FWIW, I don't think that faithful round-tripping is a high priority for to_string. If that's your goal, there are plenty of better options.
@TomAugspurger
My main goal is to save a multi-index DataFrame in human readable form and be able to load it again.
That's a difficult task :) What exact changes are you proposing?
On Wed, Mar 6, 2019 at 9:14 AM randolf-scholz notifications@github.com wrote:
@TomAugspurger https://github.com/TomAugspurger
My main goal is to save a multi-index DataFrame in human readable form and be able to load it again.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/25570#issuecomment-470144790, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIiP0zs5kG1ZDfHfFroX9ovYhoREeks5vT9tTgaJpZM4bhD6_ .
That's a difficult task :) What exact changes are you proposing? … On Wed, Mar 6, 2019 at 9:14 AM randolf-scholz @.***> wrote: @TomAugspurger https://github.com/TomAugspurger My main goal is to save a multi-index DataFrame in human readable form and be able to load it again. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25570 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIiP0zs5kG1ZDfHfFroX9ovYhoREeks5vT9tTgaJpZM4bhD6_ .
To be honest I think that a faithful reverse to to_string
is probably the best option. And the task should not be as difficult because all the information needed to reconstruct the DataFrame is contained in the string representation. In fact to me it seems all that one needs to do is
Which is essentially what is proposed in this answer: https://stackoverflow.com/a/55024872/9318372
Would #10415 fit your needs instead?
Some previous discussion buried in this thread here - idea of a read_repr
https://github.com/pandas-dev/pandas/issues/8323#issuecomment-56278302
Thanks @WillAyd and @chris-b1 for the suggestions. I am using the following script now to read the files:
df = pandas.read_fwf(f, header=[0,1])
cols = [x for x,_ in df.columns if 'Unnamed' not in x]
idxs = [y for _,y in df.columns if 'Unnamed' not in y]
df.columns = idxs + cols
df[idxs] = df[idxs].ffill()
df.set_index(idxs, inplace=True)
I do believe strongly though that the ability to read and write tables in human readable form should be a core-functionality of a module like pandas.
Makes sense. You could also specify index_col=[0, 1] and swap things around thereafter (might be easier).
I would side with @TomAugspurger assessment of the priority here as I (perhaps mistakenly) couldn't see this being useful outside of very small DataFrames when compared to the slew of other methods that exist.
With that said this is open source so let's see what others think. You are of course always welcome to submit a PR if you see an easy and scalable way to make it work
Code Sample, a copy-pastable example if possible
Problem description
I wanted to save a pandas DataFrame in human readable form, that is, as a text file with nice vertical alignment. The
to_string
function achieves precisely this, whereasto_csv
does not.I have saved data like this before, and it works just fine when one does not save the index. In this case it can be loaded via
pandas.read_csv(file, sep=r'\s+')
. I tried using theindex_cols
andheader
parameters but nothing seems to work properly.I also made a StackExchange thread.
Expected Output
Output of
pd.show_versions()