gunjandatta / sprest

An easy way to develop against the SharePoint 2013/Online REST api.
https://dattabase.com
MIT License
77 stars 16 forks source link

Parallel GetAllItems query #29

Closed johannes-z closed 7 years ago

johannes-z commented 7 years ago

Right now, GetAllItems and Top is used to query all items from a list/library serially. Meaning, the next batch of items is only retrieved, once you get the answer from the one bevor, that has the next URL.

With IListItem.ItemCount you know beforehand how many items there are in a list. What I've been doing before using this library is to build the requests for the batches manually. You know how many items there are, and how large a batch is (via the Top property).

gunjandatta commented 7 years ago

I'm not really following the logic, since I don't see the "ItemCount" property from the output of the query. Did you mean the IList.ItemCount property? The GetAllItems feature came from this post from Marc Anderson.

If there is an easier way to execute this logic, let me know :)

johannes-z commented 7 years ago

Yes, I meant the IList.ItemCount property.

List.getByEntityName(listName, list => {
  console.log(list.ItemCount)
})

In my case, 7 is printed to console on my list, even before I query the items. Because of this, you know beforehand how many elements are in a list/library, and don't have to wait for subsequent responses to be processed. The reason why this is a problem is, that on my test tenant, 2000+ items take about 2-5 seconds to be loaded from the server. If you decrease the batch size from 5000 to 500, you have more throughput, at the cost of more requests. When you have to wait for the next property, the throughput stays the same or even worsens, since the requests happen serially.

What I've been doing for the Search API is basically the same. In this case I query the first 500 elements, which also returns the TotalRows property. All items that exceed the 500 elements limit are queried in parallel, like so:

for (var i = rowLimit; i < totalRows; i += rowLimit) {
  // queryItems returns a promise
  var chunk = queryItems(query, columns, rowLimit, offset + i)
  // ...
}

That way I can have n - 1 parallel request (in my case the first one is needed to get the TotalRows property. In your case, there already is a ItemCount property).

gunjandatta commented 7 years ago

So you can use the search api to get around the 5k threshold? Would it be worth to create helper functions in the Search class to help w/ list queries, using your approach?

johannes-z commented 7 years ago

I'd advise against it, because the search server causes additional drain on the system, and the data isn't live - you'll have to wait for the crawl to pick up the changes. For the Search API this would be a nice addition though (also, on online tenants the threshold for the Search API is 500 items, not 5000).

How do you populate the IList.ItemCount property?

gunjandatta commented 7 years ago

You'll see the "ItemCount" property, since you are using the List.getByEntityName() method which executes a query against to the server to get the list. The list has the property "ItemCount", so it'll be populated. If you use the other methods:

(new List("[name]")) or (new Web()).Lists("[name]")

The "ItemCount" property won't be available, since it doesn't execute a request until you specify it.

johannes-z commented 7 years ago

Gotcha. So this would only work for querying all items from a list retrieved via getByEntityName.

gunjandatta commented 7 years ago

Yes, but I do like it. The problem is I need to wait for the request to complete in order to execute the "_next" property which contains the url of the next batch.

johannes-z commented 7 years ago

If you query the list with getByEntityName you dont have to wait for the request to complete. You know the ItemCount and thus can create the url of the batches yourself, without having to wait. A code sketch:

var rowCount = 500
List.getByEntityName(listName, list => {
  var itemCount = list.ItemCount
  var chunks = []
  for (var i = 0; i < Math.floor(itemCount / rowCount) + 1; i++) {
    var chunk = list.query({
      Top: rowCount,
      Offset: i * rowCount,
      GetAllItems: true,
      Parallel: true // Maybe as an optional flag you can define to use multiple, concurrent requests instead of using the `_next` property.
    }).execute(...)
    chunks.push(chunk)
  }
})

I'm not sure how the chunk retrieval would be implemented in your case. My simple wrapper for the Search API is Promise based, so I can use Promise.all(chunks).then(...).

gunjandatta commented 7 years ago

Closing this issue out, since the ItemCount is not available unless you use the static getByEntityName method.