tzmfreedom / rustforce

Salesforce REST API Client by Rust
MIT License
41 stars 19 forks source link

Query Next Records URL #27

Closed 0cv closed 1 year ago

0cv commented 1 year ago

Typically when more than 2000 records are returned, Salesforce will provide a nextRecordsUrl parameter that shall be used to retrieve the next batch. query and queryAll functions now recursively query until Salesforce tells that it's done.

https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/dome_query.htm

query and query_all have been refactored into a single line function calling query_with which use either query or queryAll parameter.

async-recursion has been added to the crates which hides the implementation details (dyn Box Future etc.)

However along DeserializeOwned, the Send trait to ensure thread safety must now be added when querying, so this is a breaking change, albeit mostly transparent.

tzmfreedom commented 1 year ago

@0cv Thank you for your contribution!!

The automatic recursively query call is good idea. But I think this change has the disadvantage in memory usage because query or query_all handle all collection data. I want us to be able to select memory effective way (recursive call manually) or easy way (automatically) on rustforce.

What are your thoughts on that?

0cv commented 1 year ago

@tzmfreedom I don't have such use case, using this library (or similar) for ETL tools, or data loading. Memory is not a concern but keeping logic simple and clear is a concern to me.

If you really would like to retrieve granularly records one batch at a time, a user could just use the current get method, and then call the nextRecordsUrl whenever it fits - in a similar way that this PR is doing, but then adding custom logic between each iteration.

This being said, at the moment querying "select id from account" will return maximum 2000 records and that's definitely a bug, that this PR is addressing

This PR has an issue though, batches are retrieved in the wrong order, I can push a commit fixing this

couchand commented 1 year ago

Thanks @0cv and @tzmfreedom for working on this.

I'll chime in with a data point that I'd like the direct access to the batches. Coming from other uses of the Salesforce API I'm expecting to need to handle it anyway. And I'd rather a library like this not eagerly make an unbounded number of serial requests before reporting any status. In particular, I don't want a spurious network error in a later batch to prevent me from making progress on earlier batches.

In simple cases I can see it being helpful, so I support the helper being available as an option.

couchand commented 1 year ago

FWIW this is the patch I've been using myself:

index 7a4056b..11a630f 100644
--- a/src/client.rs
+++ b/src/client.rs
@@ -242,6 +242,18 @@ impl Client {
         }
     }

+    /// Fetch additional records from a previous query
+    pub async fn query_more<T: DeserializeOwned>(&self, next_records_url: &str) -> Result<QueryResponse<T>, Error> {
+        let url = format!("{}{}", self.instance_url.as_ref().unwrap(), next_records_url);
+        let res = self.get(url, vec![]).await?;
+
+        if res.status().is_success() {
+            Ok(res.json().await?)
+        } else {
+            Err(Error::ErrorResponses(res.json().await?))
+        }
+    }
+
     /// Find records using SOSL
     pub async fn search(&self, query: &str) -> Result<SearchResponse, Error> {
         let query_url = format!("{}/search/", self.base_path());
diff --git a/src/response.rs b/src/response.rs
index 83d3767..a39764f 100644
--- a/src/response.rs
+++ b/src/response.rs
@@ -8,6 +8,7 @@ use std::collections::HashMap;
 pub struct QueryResponse<T> {
     pub total_size: i32,
     pub done: bool,
+    pub next_records_url: Option<String>,
     pub records: Vec<T>,
 }
0cv commented 1 year ago

Thanks @0cv and @tzmfreedom for working on this.

I'll chime in with a data point that I'd like the direct access to the batches. Coming from other uses of the Salesforce API I'm expecting to need to handle it anyway. And I'd rather a library like this not eagerly make an unbounded number of serial requests before reporting any status. In particular, I don't want a spurious network error in a later batch to prevent me from making progress on earlier batches.

In simple cases I can see it being helpful, so I support the helper being available as an option.

if you want to count the number of records of your query you can do something like select count(id) from account. It will return one batch. And if you want to get the first X records of your query, you can use a limit, e.g. select id from account limit 200. In any case, this library implements an implicit limit which is to me an unexpected behaviour.

Closing this issue as there is no progress on it