pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.56k stars 1.07k forks source link

repr of class methods #1620

Closed fmaussion closed 6 years ago

fmaussion commented 6 years ago

Some live news from the classroom. A student (who is learning python and xarray at the same time) wanted to compute the minimum of an array and forgot the parenthesis (quite a common mistake).

The printout in the notebook in that case is:

>>> ds.toa_sw_all_mon.min
<bound method ImplementsArrayReduce._reduce_method.<locals>.wrapped_func of <xarray.DataArray 'toa_sw_all_mon' (month: 12, lat: 180, lon: 360)>
[777600 values with dtype=float32]
Coordinates:
  * month    (month) float32 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0
  * lat      (lat) float32 -89.5 -88.5 -87.5 -86.5 -85.5 -84.5 -83.5 -82.5 ...
  * lon      (lon) float32 -179.5 -178.5 -177.5 -176.5 -175.5 -174.5 -173.5 ...
Attributes:
    long_name:      Top of The Atmosphere Shortwave Flux, Monthly Means, All-...
    standard_name:  TOA Shortwave Flux - All-Sky
    CF_name:        toa_outgoing_shortwave_flux
    IPCC_name:      none
    units:          W m-2
    valid_min:            0.00000
    valid_max:            600.000>

which, I had to agree, is hard to identify as being a function. It's a detail, but is it necessary to print the entire repr of the array/dataset in the function repr?

fmaussion commented 6 years ago

Some more live news from the classroom: it is indeed very confusing for more than one student. I think the repr should be simplified.

shoyer commented 6 years ago

I think we could probably change this by adding a custom wrapping using descriptors, but I'm not sure that's actually a good idea. This is the standard Python repr for methods, e.g., consider:

>>> df = pd.DataFrame({'x': range(1000)})
>>> df.min
<bound method DataFrame.min of        x
0      0
1      1
2      2
3      3
4      4
5      5
6      6
7      7
8      8
9      9
10    10
11    11
12    12
13    13
14    14
15    15
16    16
17    17
18    18
19    19
20    20
21    21
22    22
23    23
24    24
25    25
26    26
27    27
28    28
29    29
..   ...
970  970
971  971
972  972
973  973
974  974
975  975
976  976
977  977
978  978
979  979
980  980
981  981
982  982
983  983
984  984
985  985
986  986
987  987
988  988
989  989
990  990
991  991
992  992
993  993
994  994
995  995
996  996
997  997
998  998
999  999

[1000 rows x 1 columns]>
fmaussion commented 6 years ago

I guess that consistency is important too. Numpy is a bit less verbose though:

In [3]: a = np.linspace(1, 2)
In [4]: a.min
Out[4]: <function ndarray.min>

Maybe a compromise would be more beginner friendly (e.g. <bound method ImplementsArrayReduce._reduce_method.<locals>.wrapped_func of <xarray.DataArray 'toa_sw_all_mon' (month: 12, lat: 180, lon: 360)>>) but at the same time I'm not sure it's worth the trouble...

shoyer commented 6 years ago

My guess is that NumPy's method repr is somehow due to the fact that np.ndarray is implemented in C.

There might be some speed implication here to writing a wrapper (since it requires adding another layer of indirection) but I doubt that makes much of a difference for xarray users.

fmaussion commented 6 years ago

Closing this, as there are other more important things to deal with.

the major issue here is that the important information (bound method ImplementsArrayReduce._reduce_method.<locals>.wrapped_func of) is overwhelmed by what looks like a variable. But students should learn how to read carefully anyway ;-)