Closed rahul-gj closed 5 years ago
Thanks Rahul, this sounds perfectly reasonable. However, I will have to sublimate myself into the topic again to find out how to make this possible.
If you see a way how to solve this, please let me know - even pull requests are welcome. Otherwise, please stay tuned or try to ping me again if you don't hear back from me.
With kind regards, Andreas.
Dear Rahul,
after having a short glimpse at this, I recognized that the Patent Number
field actually is able to take such a comma-separated list of numbers already:
I have to admit that I haven't been aware of that, so thanks for letting me know.
On the other hand, there's a remark in the footer area of the results page like
The Patent Examination Data system (PEDs) shows the first 20 results in the dataset. To see more results, click the "Request Download" link.
Based on this statement, I conclude that issuing 300 numbers there would not be possible at once and that the process would have to be chunked appropriately?
Thanks for your feedback already.
With kind regards, Andreas.
Dear Rahul,
the unassuming and quick fix f07ebefa makes uspto-search
work again, which might just be what you wanted to achieve already. So, you might want to upgrade to uspto-opendata-python 0.8.3
and check one of the following examples.
uspto-peds search 'patentNumber:(6583088 6875727 8697602)'
from uspto.peds.client import UsptoPatentExaminationDataSystemClient
client = UsptoPatentExaminationDataSystemClient()
client.search('patentNumber:(6583088 6875727 8697602)')
Thanks again for recognizing this issue which has been surprisingly easy to resolve. Please let us know if this fulfills your needs.
With kind regards, Andreas.
Sure I will do. Thanks for the quick fix.
I have tested the new update. The data given by manual search is zip with year wise json files which is different that the data given by client.search but it's very fast and extensive. so for now It's fine. I will update the details soon.
I tried and think that this is quickfix and not the solution to the actual issue. The quickfix only solves the first step that is giving first 20 results. we should work torards getting all data i.e. package result which is similar to download_document.
The data given by manual search is zip with year wise json files which is different that the data given by client.search but it's very fast and extensive.
Yeah, the canonical download variant is "packaging" aka. "Zip Download" where most of the automation work of this library has been put into. As this is done asynchronously from the perspective of the client, this library has to poll the readymade archive resource for availability. Also, the archive baking takes some time on the server side.
The other method unlocked again through "search" is the direct JSON response offered by the API when searching with criteria. This is probably the data which is also displayed in the inline results lists (probably up to 20 hits only).
Both JSON output formats are completely different in their structure. Also, the "direct access" through the search response JSON is obviously also not available in XML format.
we should work torards getting all data i.e. package result
This has always been implemented as it was the main purpose of this library.
not the solution to the actual issue
I totally see your point. So a) On the one hand: Good that we fixed the issue with "search", but b) I don't see any obvious difference what a human would be doing when clicking on "Download package" and with the same thing implemented in Python when going down the "packaging" route.
Now, I'm feeling a bit lost here and also a bit sad that it's not obvious to me what your expectations are. Maybe you can help me to clarify things what this library does and how it could do better?
Thanks already, Andreas.
I think This is great and sufficient. Thanks for the help. I really appreciate it.
Dear Rahul,
thanks for your feedback.
If you think this will be fine, then let's close this. Otherwise, I would really be interested to improve the performance of the downloading process. However, I currently don't see a way where exactly this would happen, i.e. how manual interaction would be faster at an point in comparison to the automated packaging and download process this library is implementing already.
With kind regards, Andreas.
This can be closed as I can now do the searching and packaging after the update.
For my limited use, I have trimmed this library by forking. see https://github.com/rahul-gj/uspto-peds-python. Please see if I have not breached any license of anything.
Thanks
Dear Rahul,
I see what you have been aiming at. I think it is possible to have both variants implemented through code from the same repository and Python package and I might dedicate some time to merge your changes back to mainline in one way or another.
Until then, it is perfectly reasonable to have forks around like you did with your derivate uspto-peds-python. So, let's close this and track the note about the reintegration using a different ticket.
Thanks again for your valuable input and good to see that the barebone implementation of the PEDS Search API client wrapper purely based on the requests
and BeautifulSoup
packages is exactly what you have been aiming at and that you have been able to build that from parts of this library.
All the best and with kind regards, Andreas.
So, let's close this and track the note about the reintegration using a different ticket.
I just created #9 to be able to follow up on this later. Thanks again!
I would like to know if I can download the list of the patent number or application number in synchronous mode. I can do that on https://ped.uspto.gov/peds/ by giving a coma separated values like '6583088, 6875727, 8697602, 6331531, 6274350, 10112906, 9491944, 9504251, 9137998'
This is because I think and tested also to find out that It's constant time operation whether you request one or 300 it will take the almost same time to complete the requests.
Something like: