Python wrapper for arXiv API. Here are related libraries and repositories: arxiv.py, python_arXiv_parsing_example.py and arxiv-sanity-preserver. arXiv is an open-access journal which has 1M+ e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics.
Here is an example on how to use arxivpy
.
import arxivpy
articles = arxivpy.query(search_query=['cs.CV', 'cs.LG', 'cs.CL', 'cs.NE', 'stat.ML'],
start_index=0, max_index=200, results_per_iteration=100,
wait_time=5.0, sort_by='lastUpdatedDate') # grab 200 articles
Input search_query
can be list of categories
or string of arXiv formatted query. Output is a list of dictionary parsed from arXiv XML file.
This example will parse 200 last update papers (from index 0 to 200), 100 at a time with wait time
around 5 seconds (see note below if scraping many papers).
You can use other search queries, for example:
search_query=['cs.DB', 'cs.IR']
search_query='cs.DB' # select only Databases papers
search_query='au:kording' # author name includes Kording
search_query='ti:deep+AND+ti:learning' # title with `deep` and `learning`
search_query='abs:%22deep+learning%22' # deep learning as a phrase
Or you can make simple search query using arxivpy.generate_query
search_query = arxivpy.generate_query(terms=['cs.CV', 'cs.LG', 'cs.CL', 'cs.NE', 'stat.ML'],
prefix='category', boolean='OR')
Or convert plain simple text to arXiv query using arxivpy.generate_query_from_text
query = arxivpy.generate_query_from_text("author k kording & author achakulvisut & title science & abstract recommendation") # awesome paper
articles = arxivpy.query(search_query=query)
More search query prefixes, booleans and categories available can be seen from wiki page. More example queries can be found from arXiv user manual
You can also use arxivpy.download
to download the articles to given directory.
Here is a snippet to do that.
arxivpy.download(articles, path='arxiv_pdf')
Note from API
max_index
)
is limited to 30000 in slices of at most 2000 at a time.The easiest way is to use pip
.
pip install git+https://github.com/titipata/arxivpy
You can also do it manually by cloning the repository and run setup.py
to install the package.
git clone https://github.com/titipata/arxivpy
cd arxivpy
python setup.py install