Arxivpy

Python wrapper for arXiv API. Here are related libraries and repositories: arxiv.py, python_arXiv_parsing_example.py and arxiv-sanity-preserver. arXiv is an open-access journal which has 1M+ e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics.

Example

Here is an example on how to use arxivpy.

import arxivpy
articles = arxivpy.query(search_query=['cs.CV', 'cs.LG', 'cs.CL', 'cs.NE', 'stat.ML'],
                         start_index=0, max_index=200, results_per_iteration=100,
                         wait_time=5.0, sort_by='lastUpdatedDate') # grab 200 articles

Input search_query can be list of categories or string of arXiv formatted query. Output is a list of dictionary parsed from arXiv XML file. This example will parse 200 last update papers (from index 0 to 200), 100 at a time with wait time around 5 seconds (see note below if scraping many papers).

Queries

You can use other search queries, for example:

search_query=['cs.DB', 'cs.IR']
search_query='cs.DB' # select only Databases papers
search_query='au:kording' # author name includes Kording
search_query='ti:deep+AND+ti:learning' # title with `deep` and `learning`
search_query='abs:%22deep+learning%22' # deep learning as a phrase

Or you can make simple search query using arxivpy.generate_query

search_query = arxivpy.generate_query(terms=['cs.CV', 'cs.LG', 'cs.CL', 'cs.NE', 'stat.ML'],
                                      prefix='category', boolean='OR')

Or convert plain simple text to arXiv query using arxivpy.generate_query_from_text

query = arxivpy.generate_query_from_text("author k kording & author achakulvisut & title science & abstract recommendation") # awesome paper
articles = arxivpy.query(search_query=query)

More search query prefixes, booleans and categories available can be seen from wiki page. More example queries can be found from arXiv user manual

Download PDF

You can also use arxivpy.download to download the articles to given directory. Here is a snippet to do that.

arxivpy.download(articles, path='arxiv_pdf')

Note from API

The maximum number of results returned from a single call (max_index) is limited to 30000 in slices of at most 2000 at a time.
In case where the API needs to be called multiple times in a row, we encourage you to play nice and incorporate a 3 seconds delay in your code.

Installation

The easiest way is to use pip.

pip install git+https://github.com/titipata/arxivpy

You can also do it manually by cloning the repository and run setup.py to install the package.

git clone https://github.com/titipata/arxivpy
cd arxivpy
python setup.py install

titipata / arxivpy

readme

Arxivpy

Example

Queries

Download PDF

Installation

Dependencies