Open Kirill888 opened 5 years ago
Can I instruct
group_datasets
to group by solar day, but order bygqa
? No. Order within a group is tightly coupled to axis value.
Don't understand this part. group_datasets
will return an xarray
of datasets indexed by the time. Can the data
of xarray
be an ordered dictionary indexed by particular metadata, say gqa
or granule_id
?
2\. Configure dataset load precedence at `load_data` time rather than `group_datasets` time
Can live with this.
@emmaai xarray axis ordering will remain as is, so your time axis will be ordered by time. What changes is interpretation of the "value" within this xarray.DataArray
, currently value is a tuple of dataset objects where order within the tuple has meaning as far as data loading goes. I want it to be just a tuple of dataset objects where order is meaningless, or we can change it to be a set
of dataset objects to clearly communicate that this is an unordered collection of dataset objects.
Look at say pandas docs for groupby
which I assume group_dataset
is based on originally:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
Groupby preserves the order of rows within each group.
Order within a group is the order on input, so if you had A0, B1, A1, B2
and grouped by letter, you will end up with A0, A1
, B1, B2
, not because 0 <1
but because A0
was before A1
in the original list. We should probably copy that behaviour for group_datasets
, and re-order items within a group at load time.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Better control over data load precedence would still be a valuable feature to have in ODC core, particularly when combining multiple products using a list of datasets.
Introduction
Question:
Say I have two datasets with some area of overlap and I load them into one raster (using
group_by='solar_day'
for example).Answer:
Currently this depend on:
group_datasets
behaviour parameterised byGroupBy
objectload_data
behaviour parameterised byfuser
First
group_datasets
doesn't just group datasets, it also orders datasets within the group. This order is then used byload_data
to "fuse" one dataset at a time into a final raster. Default fuser behaviour is to never change output raster pixel value once "valid" pixel was observed. So in effect dataset order within a group is pixel precedence order. But one can just as easily implement fuser that overwrites previous pixel with the new ones (as long as they are valid), in which case dataset order will be the reverse of precedence order.So what is the order of datasets as returned by
group_datasets
, can I control it? Well, kinda, but not really. Say you wanted to order datasets by some metadata likegqa
score orgranule_id
.Can I instruct
group_datasets
to group by solar day, but order bygqa
? No. Order within a group is tightly coupled to axis value.Can I write custom fuser that prefers pixels from one dataset over another? No. Fuser only sees pixels, it doesn't have access to dataset metadata.
You can write code that modifies order of datasets after
group_datasets
was called, that's about it. This forces you to usefind_datasets -> group_datasets -> custom step -> load_data
instead of just parameterising.load
.Change Proposal
group_datasets
area of responsibilityload_data
time rather thangroup_datasets
timeThis allows to configure
fuser
anddataset order
together, which is important since they are related.Related issues: #643 #646 #615