parkerhancock / patent_client

A collection of ORM-style clients to public patent data
Other
92 stars 35 forks source link

USPTO repeated results? #92

Closed mustberuss closed 1 year ago

mustberuss commented 1 year ago

Hey Parker, man you've been busy! I tried one of the uspto examples and am not sure if I'm doing something wrong or if there's something up in the new code. It looks like the results are the same patent repeated, using v3.2.1. I tried looking at the code but didn't see anything obvious. Thanks again for taking this on!

# Fetch US Patents with the word "tennis" in their title issued in 2010
>>> pats = PatentBiblio.objects.filter(title="tennis", issue_date="2010-01-01->2010-12-31")
>>> pats[0]
PublicationBiblio(publication_number=7841958, publication_date=2010-11-30, patent_title=Modular table tennis game)
>>> pats[1]
PublicationBiblio(publication_number=7841958, publication_date=2010-11-30, patent_title=Modular table tennis game)
>>> pats[1] == pats[2]
True
parkerhancock commented 1 year ago

Found it!

So, slicing a manager is equivalent to using limit and offset, with a single value being equivalent to offset(n).first().

For example:

manager[1:5] == manager.offset(1).limit(4)
manager[1] == manager.offset(1).first()

It looks like there were two issues. First, PublicSearch only supports a page size of 500, and step sizes of 500. That is, I can't do limit and offset on the API. Instead, I have to fetch everything and handle it on the app side. Second, in the PublicSearch manager, i was using a break statement inside a double loop, and the outer loop wasn't breaking.

All that is fixed in what I'll release in a few moments as v.3.2.2.

Thanks!

Parker

mustberuss commented 1 year ago

Awesome, thanks! You should start charging for the education :smile: