jpetrucciani / hubspot3

python3.6+ hubspot client based on hapipy, but modified to use the newer endpoints and non-legacy python
MIT License
146 stars 72 forks source link

Result window too large when querying recently modified companies #88

Open zmrl010 opened 4 years ago

zmrl010 commented 4 years ago

I was experiencing an issue while using the companies client to query for recently modified companies. The client sends the request 250 at a time, per the default limit parameter and then increments the offset by that amount for the next query. Then when it gets to the upper limit of requested records (10000) it continues and sends another request with a 10250 offset, which throws a bad request error. Since the finished boolean value is only ever made to true when batch["hasMore"] is set to false, even if it has reached the upper limit, then it continues to make requests.

I have no idea why my company has over 10000 modified company records or if hubspot is returning more than just the recently modified records, but it seems to me like the client should stop looping at that upper limit to avoid the error. An additional check at the bottom of the loop should be enough to solve this:

finished = not batch["hasMore"] or offset + limit > 10000

I'd also probably move that line to be after the new offset is set, so we can get the value of what it would be for the next call.

It's also a tough one to get around because limit only limits the size of each call in the loop. I would also suggest adding some sort of way to limit the total number of records returned from a call, which isn't a necessity but would help to work around something like this.

I can probably submit a fix for this if you agree on the solution I proposed or if you have any other ideas I'd love to hear them.

jpetrucciani commented 4 years ago

Hmm, its hard to tell from the docs - is it limited to 10000 records or the last 30 days, whichever it hits first?

If 10000 is the absolute upper limit (it appears to be on a few different APIs too, like deals), I'd be on board with adding that to the globals file and adding the logic in these methods to stop at that limit!

zmrl010 commented 4 years ago

I agree with you there, the docs make it a little ambiguous, but I did find out the true answer back when I posted this. It's whichever hits first. I needed a way to search through companies by name and since the api didn't exist at the time, I resolved to store all the data on our severs and refresh it every day.

I like your proposed solution to solving it for this. I'd bet the same limit is imposed on all the similar api endpoints and should be treated as such. I might have some time over the next week to work on it unless you've already got something in the works.

jpetrucciani commented 4 years ago

I don't yet have anything in the works for this - but if you have some time, feel free!