parkerhancock / patent_client

A collection of ORM-style clients to public patent data
Other
92 stars 35 forks source link

Docs example fix requested - Finding all the patents for a given company (or organization) #176

Open MatthewLoffredo opened 2 months ago

MatthewLoffredo commented 2 months ago

Hi, I'm trying to follow the page here (https://patent-client.readthedocs.io/en/latest/examples/3%20-%20Company%20Ownership.html) to get a company's patent portfolio, but the first step doesn't seem to be working:

Calling the first step:

applicant_apps = USApplication.objects.filter(first_named_applicant='University of California, Berkeley').values_list('appl_id', flat=True).to_list()

results in:

[/usr/local/lib/python3.10/dist-packages/pydantic/main.py](https://localhost:8080/#) in model_validate(cls, obj, strict, from_attributes, context)
    566         # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    567         __tracebackhide__ = True
--> 568         return cls.__pydantic_validator__.validate_python(
    569             obj, strict=strict, from_attributes=from_attributes, context=context
    570         )

ValidationError: 1 validation error for PedsPage
queryResults.searchResponse.response.docs.1.appFilingDate
  Field required [type=missing, input_value={'corrAddrCountryName': '... 02818 (UNITED STATES)'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/missing

Additionally, I was wondering if there were any tricks you had for getting all the different potential assignee names for companies? Some companies might have multiple different ones, which using this method, would potentially exclude any name variations. Any advice on how to handle that? The end goal is to get a company's complete patent portfolio.

Hobly commented 2 months ago

On the first point, it's worth noting that PEDS will be deprecated very soon by the USPTO so if you want to use patent_client for landscaping, your best bet is to use the open data portal (patent_client.odp) module.

Secondly, if you only want retrieve a list of applications and don't need all the patent_client bells and whistles then it's pretty trivial to just do a python requests call to the odp search API here: https://beta-data.uspto.gov/apis/getting-started. Construct a suitable query string (e.g. something like q = "applicationMetaData.applicantBag.applicantNameText: Google")

On normalising different company names, there is no universal solution to this problem so you'll almost always have to manually do this. Patent applicant name data is notoriously fractured due to attorneys typing these things in slightly differently so you'll need to build a custom query to capture as many variants as you can. Directly calling the uspto odp api also lets you do wildcards in your search query to try to capture as many variants as you can. It's pretty interesting to do a broad subject-matter search and then listing all the unique ways a company has been listed. I once did one where a well-known Korean research institute was named in 15 different ways...