Closed kepiej closed 5 months ago
Hi @kepiej! Thanks for the question, and thanks for your interest in the library!
So, the USPTO's Public Search system is based on Apache Solr, which only returns accurate total result numbers if the quantity is less than 500. What that means is that if the len
function returns 500, it should be interpreted as "There are 500 or more results," rather than "There are 500 results." It should, however, give accurate counts for any query that returns fewer than 500 results.
I'll keep this open, and in a future version, have it raise a warning if you use the Public Search API and the result is >= 500.
Thanks!
Thanks @parkerhancock for the clarification!
I did also notice that for results < 500 the len function is also not accurate. Consider this example:
from patent_client import PublicSearchBiblio
result = PublicSearchBiblio.objects.filter(query="(solvent NEAR1 recovery).TTL.").order_by("-app_filing_date")
print(len(result))
This yields 291. However, when actually fetching the data as a pandas dataframe the length is much larger:
df = PublicSearchBiblio.objects.filter(query="(solvent NEAR1 recovery).TTL.")
.order_by("-app_filing_date")
.values(
"publication_number",
"patent_title",
"applicant_names",
"assignee_names",
"publication_date",
"app_filing_date",
"type",
).to_pandas()
print(df.shape[0])
This returns a pandas dataframe with 445 rows!
Any idea what's going on here?
Hi, for me when I call the same code it throws error as:
An error occurred: Client error '401 Unauthorized' for url 'https://ppubs.uspto.gov/dirsearch-public/searches/searchWithBeFamily'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401
Is there any other configuration that needs to be there or what? @parkerhancock
This should be fixed in v5. if you keep having issues, let me know!
Thanks for developing this nice and super useful library! :) I'm trying to use the Patent Public Search Basic to retrieve results and find the total number of results as follows:
This prints 500. However, when I try to collect all results using
then df contains 2249 rows! The same happens for different search terms.
Is this a bug or am I doing something wrong here?