pingidentity / ldapsdk

UnboundID LDAP SDK for Java
Other
327 stars 79 forks source link

Is there way to get page number N during pagination request without iterating over previous pages ? #160

Closed gredwhite closed 6 months ago

gredwhite commented 6 months ago

Currently I use SimplePagedResultsControl and my code looks similar to example on that page: https://docs.ldap.com/ldap-sdk/docs/javadoc/com/unboundid/ldap/sdk/controls/SimplePagedResultsControl.html

The following example demonstrates the use of the simple paged results control. It will iterate through all users, retrieving up to 10 entries at a time:
 // Perform a search to retrieve all users in the server, but only retrieving
 // ten at a time.
 int numSearches = 0;
 int totalEntriesReturned = 0;
 SearchRequest searchRequest = new SearchRequest("dc=example,dc=com",
      SearchScope.SUB, Filter.createEqualityFilter("objectClass", "person"));
 ASN1OctetString resumeCookie = null;
 while (true)
 {
   searchRequest.setControls(
        new SimplePagedResultsControl(10, resumeCookie));
   SearchResult searchResult = connection.search(searchRequest);
   numSearches++;
   totalEntriesReturned += searchResult.getEntryCount();
   for (SearchResultEntry e : searchResult.getSearchEntries())
   {
     // Do something with each entry...
   }

   LDAPTestUtils.assertHasControl(searchResult,
        SimplePagedResultsControl.PAGED_RESULTS_OID);
   SimplePagedResultsControl responseControl =
        SimplePagedResultsControl.get(searchResult);
   if (responseControl.moreResultsToReturn())
   {
     // The resume cookie can be included in the simple paged results
     // control included in the next search to get the next page of results.
     resumeCookie = responseControl.getCookie();
   }
   else
   {
     break;
   }
 }

Is there way to get page number N without iterating over previous pages ? Is there way to get amount of pages/entries ?

P.S. I use Samba (AD)

dirmgr commented 6 months ago

No, there isn't. The simple paged results control is only designed to iterate through pages in sequential order. It doesn't let you start at an arbitrary point in the results, skip around, or revisit pages you've already retrieved.

The other major control that allows you to iterate through results in pages is the virtual list view request control. It does let you do all those things, but I don't believe that Active Directory supports it, so it's also unlikely that Samba supports it. However, you can always check the root DSE to see if its OID is present.

gredwhite commented 6 months ago

No, there isn't. The simple paged results control is only designed to iterate through pages in sequential order. It doesn't let you start at an arbitrary point in the results, skip around, or revisit pages you've already retrieved.

The other major control that allows you to iterate through results in pages is the virtual list view request control. It does let you do all those things, but I don't believe that Active Directory supports it, so it's also unlikely that Samba supports it. However, you can always check the root DSE to see if its OID is present.

Thank you for your hint!

Based on this look like it is supported.

image

Will try to use

gredwhite commented 6 months ago

I found in documentaion

The virtual list view control can retrieve pages out of order, can retrieve overlapping pages, and can re-request pages that it had already retrieved

Where can I read about it in more details ?

dirmgr commented 6 months ago

The Javadoc for the VirtualListViewRequestControl class provides a pretty decent description, including a code example. That example just iterates through the pages ten entries at a time, but just changing the initial vlvOffset should be enough to start retrieving pages in the middle of the result set. And if you want to start retrieving results at the first entry with a primary sort value greater than or equal to a specified value, then you can use the alternative constructor that uses an assertion value instead of a target offset.

The official specification for the control can be found at https://docs.ldap.com/specs/draft-ietf-ldapext-ldapv3-vlv-09.txt.

Note that the virtual list view request control is only allowed in search requests that also include the server-side sort request control, and some servers require special indexing to be able to use it.

gredwhite commented 6 months ago

I just tried to do it and it works with samba

The virtual list view control can retrieve pages out of order, can retrieve overlapping pages, and can re-request pages that it had already retrieved

Is it list of advantages or disadvantages ? initially I've got it as disadvantage list but I changed my opinion and looks like they are advantages

Currently I can't understand how can I avoid providing assertionValue I just want to start from the begining but get page number 15

========== let me rephrase my question:

When the start of the result set is to be specified using an offset, then the virtual list view request control should include the following elements: targetOffset -- The position in the result set of the entry to target for the next page of results to return. Note that the offset is one-based (so the first entry has offset 1, the second entry has offset 2, etc.). beforeCount -- The number of entries before the entry specified as the target offset that should be retrieved. afterCount -- The number of entries after the entry specified as the target offset that should be retrieved. contentCount -- The estimated total number of entries that are in the total result set. This should be zero for the first request in a VLV search sequence, but should be the value returned by the server in the corresponding response control for subsequent searches as part of the VLV sequence. contextID -- This is an optional cookie that may be used to help the server resume processing on a VLV search. It should be absent from the initial request, but for subsequent requests should be the value returned in the previous VLV response control.

Could I request page number 12 on the first request ? (contextID will be null) What disadvandate if I don't pass contextID in subsequent requests ?

dirmgr commented 6 months ago

I would say that in general, the ability to jump around the result set is an advantage, but there are costs associated with it. It requires that the results be sorted, and the server must support it (which might require special configuration in the server to be able to handle it at all, or at least to handle it efficiently, like needing to have a special index defined).

It's definitely possible to start on an arbitrary page. Assuming that you're using fixed-size pages (e.g., ten entries per page), then just adjust the offset accordingly (e.g., 12 * 10 = 120 for the first page). Whenever the server returns a page of results, the VLV response control may include a context ID, and if it does, then you should include that context ID in the next request in the series. While it would be possible for you to omit the context ID in subsequent requests, it's possible that could make the operation more expensive and therefore take longer to complete. For example, the server might need to precompute the entire result set in order to be able to return a given page of results, and it might be able to cache that to make it faster when retrieving other pages in that same search. In such cases, the context ID would likely include something that helps the server match the request to the cached result set.

gredwhite commented 6 months ago

I would say that in general, the ability to jump around the result set is an advantage, but there are costs associated with it. It requires that the results be sorted, and the server must support it (which might require special configuration in the server to be able to handle it at all, or at least to handle it efficiently, like needing to have a special index defined).

It's definitely possible to start on an arbitrary page. Assuming that you're using fixed-size pages (e.g., ten entries per page), then just adjust the offset accordingly (e.g., 12 * 10 = 120 for the first page). Whenever the server returns a page of results, the VLV response control may include a context ID, and if it does, then you should include that context ID in the next request in the series. While it would be possible for you to omit the context ID in subsequent requests, it's possible that could make the operation more expensive and therefore take longer to complete. For example, the server might need to precompute the entire result set in order to be able to return a given page of results, and it might be able to cache that to make it faster when retrieving other pages in that same search. In such cases, the context ID would likely include something that helps the server match the request to the cached result set.

I just tried to use the approach you suggessted regarding calculating the offset to get page number N and it works perfectly. Thank you.

I have 2 additional questions:

  1. When I call vlvResponseControl.contentCount I get

estimated total number of entries in the result set

What does it mean estimated ? can't find any information about it.

  1. Question about optional cookie

@param contextID The context ID that may be used to help the server continue in the same result set for subsequent searches. For the first request in a series of searches with the VLV control, it should be {@code null}. For subsequent searches in the VLV sequence, it should be the (possibly {@code null}) context ID included in the response control from the previous search.

How long does this cookie live ? I try to implement UI interface with clickable pages. So user will be able to click to any page. What will be if I come with the same cookie after 5/10/30 minutes after I've got it ?

dirmgr commented 6 months ago
  1. It can be an estimate for several reasons. The specification document that I referenced earlier suggests that it might be because the server may not know exactly how many entries match the criteria, or that number of matching entries might change as other operations are processed in the server between requests for different pages.

  2. Anything related to the contextID is implementation-specific and not explicitly defined in the specification. You can't rely on its format, or event that one will be returned. If one was returned in the last page that you received, then it's a good idea to include it in the next page, especially if you want to ensure contiguous results in the midst of changes to server data between pages. The length of time that it lives is undefined, but the specification does state that if it receives a contextID that is invalid (which could presumably mean one that used to be valid but isn't any longer), then it should return a protocolError result code in the VLV response control (which is NOT the same as the result code for the search operation), and if the client provides a valid contextID but the server can't honor it for some reason, then it should return an unwillingToPerform result code in the VLV response control.

Ultimately, any questions you have about how Samba handles specifics related to the VLV control are better asked of those involved with the Samba project.

gredwhite commented 6 months ago

[dirmgr] (https://github.com/dirmgr)Thank you for your answer they are really heplful.

Regarding contextID. Lets imagine that 2 different users (and each of them bind with their own credentials) scroll the same list (so the same query is used but pages are different). Is it a good idea to use the shared contextId for both of them ?

dirmgr commented 6 months ago

Please read the specification document for the control, which I've referenced twice here already. I'm trying to be polite and helpful, but please try to answer these questions for yourself before asking someone else to do it for you.

In answer to this latest question, the specification states "contextID values have no validity outside the connection and query with which they were received. A client MUST NOT submit a contextID which it received from a different connection, a different query, or a different server."

gredwhite commented 6 months ago

Please read the specification document for the control, which I've referenced twice here already. I'm trying to be polite and helpful, but please try to answer these questions for yourself before asking someone else to do it for you.

In answer to this latest question, the specification states "contextID values have no validity outside the connection and query with which they were received. A client MUST NOT submit a contextID which it received from a different connection, a different query, or a different server."

To be honest I've already read it. I just realized that different user = different connection. Thank you one more time.

gredwhite commented 6 months ago

dirmgr

A client MUST NOT submit a contextID which it received from a different connection, a different query, or a different server."

public SearchRequest(@NotNull final String baseDN,
                       @NotNull final SearchScope scope,
                       @NotNull final Filter filter,
                       @Nullable final String... attributes)

If we have 2 SearchRequests which are different only in attributes. Will it mean that queries are different ?

dirmgr commented 6 months ago

The set of attributes to return shouldn't affect the set of entries that match the request, or the order in which they are returned. However, the specification doesn't explicitly address that, so an individual server implementation is free to consider it either significant or insignificant. I am not involved with the Samba project in any way, so I can't comment on how they may have chosen to implement things.

It seems unlikely that merely changing the set of requested attributes would make a difference, but it also seems unlikely that a client would want to change the set of requested attributes in the course of a single sequence of paging. It's not out of the realm of possibility that a server implementation might generate some key that identifies the paging sequence based on elements of the request, and that may or may not include the set of requested attributes.

If you want a definitive answer about how the Samba server deals with this, then ask the Samba people. The LDAP SDK is responsible for encoding and sending the request, and receiving and decoding the response. It doesn't have any control over what the server does in the course of processing that request or generating the response.