lukasschwab / arxiv.py

Python wrapper for the arXiv API
MIT License
1.11k stars 123 forks source link

How to use 'full text search' in arxiv.Search query? #92

Closed hydai99 closed 2 years ago

hydai99 commented 2 years ago

Hi, I want to collect all article come from a specific institution. The only way I know on arxiv.org now is full-text search. I wonder is there anyway to do this by python api code ? Thanks!

lukasschwab commented 2 years ago

Hello––no, unfortunately this API doesn't expose full-text search. Check out Details of Query Construction for a full list of supported query features.

There is some author affiliation data available from the API which you could use to filter (<arxiv:affiliation>, discussed here), but they unfortunately aren't exposed by this client library because of an issue in the underlying parser: https://github.com/lukasschwab/arxiv.py/issues/62

You might try working with (Result)._raw.arxiv_affiliation, but this reflects at most one affiliation:

>>> import arxiv
>>> for r in arxiv.Search("testing").results():
...     if "arxiv_affiliation" in r._raw:
...             r._raw.arxiv_affiliation
...
'Giesecke and Devrient GmbH'
'University at Buffalo, SUNY'
'Nanyang Technological University'
'G-SCOP\\_CPP, G-SCOP'
'Fraunhofer FOKUS'
'NIT, Rourkela'
'UTT'

If you have the resources to run your own full-text indexing, you can look into arxiv-miner and Sci-Genie.