Open jhamman opened 7 years ago
I'd also suggest that a global option of always_keep_attrs=True would be useful. While I understand the logic of dropping units during certain operations, it makes attributes unusable for storing other miscellaneous metadata, e.g. on data provenance. As a recent xarray convert, this behavior has been frustrating.
This issue is very relevant for me too. I would like to also propose that a user could provide a function that would know how to combine the attrs
of different DataArrays.
I am also interested. In terms of the table from @jhamman I am in principle ok with. However, there could be an option to refer to the original attrs in order to provide provenance even on operations like reduce and arithmetic. The idea here is reproducibility and tractability. Maybe an 'origin' attribute?
The challenge with a user-specified function is that there can potentially be weird conflicts if multiple libraries try to override it. Possibly it's worth it for the convenience, but subclasses allowing for explicit hooks (like numpy) is probably the cleanest solution.
Hi, this feature would be very relevant to the intended use case of a project I'd like to use xarray
for. Is the behaviour discussed in the first post implemented anywhere, e.g. in the trunk, for me to play with?
Also - might I suggest you consider some kind of history tracker as part of the metadata propagation? Perhaps metadata could be saved from each step of a set of operations, so that there is a full paper trail for the set of operations have been applied to the data. It could however get complicated to merge together objects with their own separate histories, especially if they ultimately descend from the same original data set.
This would be very relevant for scientific analyses.
Hi, this feature would be very relevant to the intended use case of a project I'd like to use xarray for. Is the behaviour discussed in the first post implemented anywhere, e.g. in the trunk, for me to play with?
are you referring to a different issue? the first post only summarizes some simple proposed rules.
Also - might I suggest you consider some kind of history tracker as part of the metadata propagation?
Certainly this would be out of scope for xarray itself, but this perhaps be done with a library that wraps xarray's API. If I recall correctly, @pwolfram was also interested in this.
We did discuss customizable hooks for attribute handling in #988 but I'm no longer sure that is a good idea. These sort of overloads are really hard to get right, as we've seen with NumPy's long history of different override protocols (the most recent example being __array_ufunc__
).
consider some kind of history tracker as part of the metadata propagation?
Data lineage is a big, hard, unsolved problem (~for us~ internally, above both naming things and cache invalidation :) )
To second @shoyer, I think it's big and difficult enough to be a separate library
are you referring to a different issue? the first post only summarizes some simple proposed rules.
No, just the proposed feature to keep or delete metadata based on the various operations. Is this behaviour already part of the library, and this issue is just to clarify the intended behaviour, or is this a feature proposal?
No, just the proposed feature to keep or delete metadata based on the various operations. Is this behaviour already part of the library, and this issue is just to clarify the intended behaviour, or is this a feature proposal?
We already have most of this behavior (matching what @jhamman lists in the first comment), though it isn't clearly documented. It should just work if you use xarray methods/functions.
@shoyer, I assume you are referring to the keep_attrs
option. Is there a way to persist attrs
during arithmetic options? I find myself writing a bunch of boilerplate to transfer the wealth of metadata included with most netCDF files.
I realize that adding a module-level or DataArray
instance-specific maintain_attrs
configuration flag (as discussed in #131, #988, #1271) could be problematic, but this strikes me as complexity worth adding. The current approach of dropping all metadata (not just units) seems heavy-handed and unintuitive for new/casual users. As you mentioned in #1271, better to have stale metadata than no metadata at all.
I would happy to add a global keep_attrs
option to xarray.set_options()
, which we could use for controlling arithmetic. I'm not planning on working on it personally, but I would be happy to review a PR.
Another one to decide is xarray.zeros_like(...)
and friends.
I would happy to add a global keep_attrs option to xarray.set_options(), which we could use for controlling arithmetic. I'm not planning on working on it personally, but I would be happy to review a PR.
Note that this was implemented by @TomNicholas in https://github.com/pydata/xarray/pull/2482
We need to come up with some clear rules for when and how xarray should propagate metadata (attrs/encoding). This has come up routinely (e.g. #25, #138, #442, #688, #828, #988, #1009, #1271, #1297, #1586) and we don't have a clear direction as to when to keep/drop metadata.
I'll take a first cut:
cc @shoyer (following up on https://github.com/pydata/xarray/issues/1586#issuecomment-334954046)