inspirehep / hepcrawl

Scrapy project for feeds into INSPIRE-HEP
http://inspirehep.net
Other
17 stars 30 forks source link
crawler harvest-data publishing python

.. This file is part of hepcrawl. Copyright (C) 2015, 2016, 2017 CERN.

hepcrawl is a free software; you can redistribute it and/or modify it
under the terms of the Revised BSD License; see LICENSE file for
more details.

========== HEPcrawl

.. image:: https://img.shields.io/travis/inspirehep/hepcrawl.svg :target: https://travis-ci.org/inspirehep/hepcrawl

.. image:: https://img.shields.io/github/tag/inspirehep/hepcrawl.svg :target: https://github.com/inspirehep/hepcrawl/releases

.. image:: https://img.shields.io/pypi/dm/hepcrawl.svg :target: https://pypi.python.org/pypi/hepcrawl

.. image:: https://img.shields.io/github/license/inspirehep/hepcrawl.svg :target: https://github.com/inspirehep/hepcrawl/blob/master/LICENSE

HEPcrawl is a harvesting library based on Scrapy (http://scrapy.org) for INSPIRE-HEP (http://inspirehep.net) that focuses on automatic and semi-automatic retrieval of new content from all the sources the site aggregates. In particular content from major and minor publishers in the field of High-Energy Physics.

The project is currently in early stage of development.

See full documentation at http://pythonhosted.org/hepcrawl