pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.56k stars 1.07k forks source link

Formatting data array as strings? #5985

Open ahuang11 opened 2 years ago

ahuang11 commented 2 years ago

https://github.com/pydata/xarray/discussions/5865#discussioncomment-1636647

I wonder if it's possible to implement a built-in function like: da.str.format("%.2f") or xr.string_format(da, "%.2f)

To wrap:

import xarray as xr

da = xr.DataArray([5., 6., 7.])
das = xr.DataArray("%.2f")
das.str % da

<xarray.DataArray (dim_0: 3)>
array(['5.00', '6.00', '7.00'], dtype='<U4')
Dimensions without coordinates: dim_0
mathause commented 2 years ago

I think that sounds sensible. The idea would be to wrap xr.DataArray.str.__mod__ in xr.DataArray.str.format. Are you interested in providing a PR?

ahuang11 commented 2 years ago

Actually would xr.DataArray.str.format work? e.g. xr.DataArray([0, 1, 2]).str would raise an error right since it's int type?

husainridwan commented 1 year ago

Hi @TomNicholas , how do I contribute to this as an outreachy intern?

TomNicholas commented 1 year ago

Hi @alrho007 - thanks for your interest in contributing to xarray!

The code for the string accessor is in xarray/core/accessor_str.py. That's where you would need to make changes.

The idea would be to wrap xr.DataArray.str.mod in xr.DataArray.str.format.

@mathause could you expand on what you mean exactly? I'm new to this part of the codebase. There is already a .format method, are you talking about changing its behaviour?

xr.DataArray([0, 1, 2]).str would raise an error right since it's int type?

Also @ahuang11 this seems fine? Because the .str accessor doesn't actually perform a check that the passed data is a str type immediately?

ahuang11 commented 1 year ago

Thanks for following up! It's been a while so I don't remember; feel free to disregard what I said.

mathause commented 1 year ago

I probably got this from a pandas issue/ pull request - but I can't remember where exactly - sorry...

headtr1ck commented 1 year ago

Found as an answer to a proposal PR: https://github.com/pydata/xarray/pull/7628#discussion_r1140211918

The solution is to use .str.format the other way around as you propose:

You can try das.str.format(da). Maybe you need to use {} instead of % in your format array.