ip-tools / patzilla

PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.
https://docs.ip-tools.org/patzilla/
GNU Affero General Public License v3.0
97 stars 21 forks source link

Problem getting drawings from USPTO #48

Closed amotl closed 1 year ago

amotl commented 2 years ago

Just discovered those in the log files.

WARNING  [patzilla.access.epo.ops.api             ][MainThread] No image information for document=US2022110447A1
INFO     [patzilla.access.uspto.image             ][MainThread] USPTO: Fetching first drawing of "US20220110447A1"
INFO     [patzilla.access.uspto.image             ][MainThread] USPTO: Searching for TIFF document "US20220110447A1" at "http://aiw1.uspto.gov/.aiw?Docid=20220110447&idkey=NONE"
ERROR    [patzilla.access.uspto.image             ][MainThread] We failed to open url "http://aiw1.uspto.gov/.aiw?Docid=20220110447&idkey=NONE". reason=[Errno -2] Name or service not known, code=None
WARNING  [patzilla.access.uspto.image             ][MainThread] No content in main document page 'US20220110447A1' (url: http://aiw1.uspto.gov/.aiw?Docid=20220110447&idkey=NONE)
INFO     [patzilla.access.uspto.image             ][MainThread] USPTO: Searching for TIFF document "US20220110447A1" at "http://aiw2.uspto.gov/.aiw?Docid=20220110447&idkey=NONE"
ERROR    [patzilla.access.uspto.image             ][MainThread] We failed to open url "http://aiw2.uspto.gov/.aiw?Docid=20220110447&idkey=NONE". reason=[Errno -2] Name or service not known, code=None
WARNING  [patzilla.access.uspto.image             ][MainThread] No content in main document page 'US20220110447A1' (url: http://aiw2.uspto.gov/.aiw?Docid=20220110447&idkey=NONE)

It looks like all of patimg1.uspto.gov, patimg2.uspto.gov, aiw1.uspto.gov and aiw2.uspto.gov have been decomissioned.

aghster commented 2 years ago

As it seems, USPTO does not provide TIFF images any longer, but only PDF: https://patft.uspto.gov/netahtml/PTO/help/images.htm (see "Notices")

Instead of loading a TIFF from http://aiw1.uspto.gov/.aiw?Docid=20220110447&idkey=NONE, you will probably have to load a PDF from https://pdfaiw.uspto.gov/47/2022/04/011/1.pdf.

Apparently, for a Docid "abcd0efghij" the URL to access a PDF of the n-th page is: https://pdfaiw.uspto.gov/ij/abcd/gh/0ef/n.pdf

amotl commented 2 years ago

Hi @aghster,

thanks. Nice to see you again. You are absolutely right. Currently, I am trying to figure out if I can trust the observation that "Drawings" are always on section 2 / page 2. Can you spot any contradicting samples?

With kind regards, Andreas.

Summary

I am picking two arbitrary samples here. The application is fairly new.

Application

Previous URL: http://aiw1.uspto.gov/.aiw?Docid=20220110447&idkey=NONE New URL: https://pdfaiw.uspto.gov/.aiw?docid=20220110447&SectionNum=2 Direct access: https://pdfaiw.uspto.gov/47/2022/04/011/2.pdf

Publication

Previous URL: http://patimg1.uspto.gov/.piw?Docid=05123456&idkey=NONE New URL: https://pdfpiw.uspto.gov/.piw?docid=05123456&SectionNum=2 Direct access: https://pdfpiw.uspto.gov/56/234/051/2.pdf

amotl commented 2 years ago

I found a contradicting example. Within the document US10194689B2, drawings at section 2 ^1 will only start on page 5 ^2.

amotl commented 2 years ago

Hi again.

a49eeae34d has a fix for this issue, and b0d8825cb covers it with corresponding test cases. Both are part of #49. Thank you again!

With kind regards, Andreas.

amotl commented 1 year ago

Dear @aghster,

USPTO PatFT and AppFT servers have been decommissioned recently, see #61.

With kind regards, Andreas.