Closed itaiad200 closed 1 year ago
AFAIR this has always been a bit suspect. (I think) I initially added it in some misguided attempt at feature parity with Git. Later it became too hard to support and we removed it; I think one issue is that whole-range operations become tough (does the size of a range appear on the metarange?), counting double deletes is surprisingly hard to do correctly, staging deletes are (or used to be) hard to count, etc.
I think it should be safe to get rid of it. It's been all-zeroes for a long while now, and I have seen ZERO complaints. Adding the dreaded "1.0" label as this is technically a breaking change :-)
@arielshaqed I have seen > ZERO complaints :) I don't think there's a technical reason for this not to work (meta-ranges contain key counts for each range). Staging is less relevant since we're talking about a merge operation..
@arielshaqed I have seen > ZERO complaints :)
I don't think there's a technical reason for this not to work (meta-ranges contain key counts for each range). Staging is less relevant since we're talking about a merge operation..
You're right -- I was worried that we were not filling them in, but actually they are! The only issue that remains is that sometimes (hopefully usually) we replace an entire range, and then we don't know how many items it changed. We could overestimate the counts, but this would be wildly inaccurate.
Perhaps make it an option in the API...
I think we can suggest calling 'diff' after successful merge to get a merge results and removing the results from the merge API. This way we can keep the current merge optimizations.
I think we can suggest calling 'diff' after successful merge to get a merge results and removing the results from the merge API. This way we can keep the current merge optimizations.
Thinking about it over the weekend, I think that this is exactly right. There is no significant advantage to returning an exact count. Meanwhile, for many (most?) programmatic application you would want to diff in order to do any of the following efficiently:
(If need be, even add a separate counting-diff operation.)
lakeFS v0.91.0 release will return summary, but the API was updated to have this field (and members) as optional. Separate PR will be merged later when we would like to drop the summary information from API.
https://github.com/treeverse/airflow-provider-lakeFS/pull/46 Upgrade AirFlow integration
@nopcoder please add a reminder for x weeks to delete it from the API.
@nopcoder @itaiad200 when do we plan on introducing this breaking change?
Waiting for more clients to upgrade to a version where the Summary field is Optional. It will probably take months.
We don't calculate the statistics of the merge operation and return a zero list regardless of the real result:
If we want to continue supporting that response body we ought to return the real numbers.