add workarounds for servers that don't paginate correctly

faebd7 commented 3 years ago

Some Swift API implementations (I've observed this with Ceph RADOSGW) can return fewer results than specified by the "limit" parameter, even when we have not reached the end of the listing.

It's unclear to me from reading the API docs whether this is a violation of the API specification, but since it happens in the wild, it's best to be able to handle it.

One way of doing this is to simply keep fetching pages until we receive an empty page. Another is to assume that pages within a certain percentage of limit are not the last page. Given the tradeoffs involved, let's support both.

faebd7 commented 3 years ago

I discovered this problem while trying to use rclone to copy data from an OpenStack Swift into a Ceph RADOSGW.

Listings of some RADOSGW containers would terminate after 1999 entries even though there were over 10,000 objects in them, and rclone copy would copy the same objects again and again on multiple runs.

With this change applied to a local build of rclone I'm now getting full listings and clean syncs with no spurious copies.

faebd7 commented 3 years ago

It will indeed result in an extra transaction, and unfortunately I can't see another way around it. (The number of objects in a container can be looked up by issuing a HEAD request, but of course that's still an extra request, and racy to boot.)

Correcting myself, I found a separate page describing the Swift API's pagination and RADOSGW is clearly in the wrong, at least in terms of that document.

However, I also discovered that Swift's own Python client code (which I think we could probably call the reference client) doesn't implement pagination as described above when fetching full listings, rather just fetching new pages until it receives an empty one. (Link is into get_container; get_account does the same.)

The code dates from 2012, so perhaps this was implemented before API pagination was nailed down. Either way, swift list and get_container(..., full_listing=True, limit=1000) do not trip over this RADOSGW bug.

ncw commented 3 years ago

One thing we could do is make a feature flag for this and only do the new behaviour if the feature flag is set.

So make a new flag in the Connection struct and check it to enable the new behaviour.

I have a feeling that this has already been reported as a radosgw bug - it would be worth searching their issue tracker to see.

I hate the idea of doubling the number of transactions for directory traversals - I can see that being very bad for performance.

ncw commented 3 years ago

...or we could use some kind of heuristic - if we got more than 90% of the max listing then do an extra transaction just to check.

This would probably work quite well but has the potential to go wrong.

faebd7 commented 3 years ago

The 90% heuristic should work nicely, based on the lengths of the replies I'm getting from RADOSGW for my problematic buckets:

reply lengths: 1000 999 1000 1000 1000 1000 1000 1000 1000 1000 119
reply lengths: 1000 992 1000 1000 1000 1000 1000 935 1000 1000 257
reply lengths: 1000 1000 1000 1000 1000 975 1000 948
reply lengths: 953 1000 1000 1000 1000 1000 954 1000 1000 70
reply lengths: 1000 1000 1000 1000 998 15
reply lengths: 1000 1000 1000 1000 974 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 939 1000 1000 1000 1000 949 1000 1000 1000 644
reply lengths: 1000 1000 1000 1000 999 1000 1000 937 1000 1000 538
reply lengths: 1000 998 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 551
reply lengths: 1000 1000 1000 1000 1000 1000 1000 931 1000 986 1000 1000 1000 975 1000 989 1000 1000 1000 966 1000 998 921 994 1000 1000 973 58
reply lengths: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 976 1000 366
reply lengths: 1000 1000 1000 1000 1000 983 1000 1000 1000 1000 1000 1000 1000 517
reply lengths: 1000 1000 1000 984 1000 1000 971 1000 1000 401
reply lengths: 949 1000 1000 1000 1000 1000 1000 403
reply lengths: 1000 998 532
reply lengths: 951 1000 1000 1000 1000 1000 976 1000 877

I was not able to find a matching bug in the Ceph tracker, but I'll start a thread on ceph-users to confirm.

And I'll also take a look at implementing the flag and heuristic.

ncw commented 3 years ago

The 90% heuristic should work nicely, based on the lengths of the replies I'm getting from RADOSGW for my problematic buckets:

useful data - thanks

Those missing items are probably filtered out items (eg deleted items) or something like that.

I was not able to find a matching bug in the Ceph tracker, but I'll start a thread on ceph-users to confirm.

If you find something out can you link it here?

And I'll also take a look at implementing the flag and heuristic.

:-)

faebd7 commented 3 years ago

Those missing items are probably filtered out items (eg deleted items) or something like that.

I'm reasonably certain that it's not deleted items, at least -- it's a fairly new cluster, and these buckets have only ever been written to by rclone copy.

I was not able to find a matching bug in the Ceph tracker, but I'll start a thread on ceph-users to confirm.

If you find something out can you link it here?

Most definitely!

And I'll also take a look at implementing the flag and heuristic.

:-)

Implemented, and from my testing with hacked rclone builds, the workarounds seem to perform as expected when enabled.

faebd7 commented 3 years ago

I forgot to mention I made a draft rclone PR that makes this change much easier to try out: https://github.com/rclone/rclone/pull/5224

ncw / swift

add workarounds for servers that don't paginate correctly #167