gpirrotta / OpenAlboPretorio

OpenAlboPretorio is a scraper library able to extract data from the Albo Pretorio Web archive present on Italian city institutional websites
MIT License
3 stars 0 forks source link

OpenAlboPretorio

The OpenAlboPretorio project consists in a scraper library able to extract data from the Albo Pretorio Web archive present on Italian city institutional websites. The Albo Pretorio is the public archive containing all administrative acts that concern the Municipality administrive life. Extracted data can be exported in JSON (default) and Feed (rss2, atom) formats.

Build Status

Installation

Clone the github repo in your machine

$ git clone https://github.com/gpirrotta/OpenAlboPretorio.git
$ cd OpenAlboPretorio

And run these two commands to install it:

$ wget http://getcomposer.org/composer.phar
$ php composer.phar install

Now you can add the autoloader, and you will have access to the library:

<?php

require 'vendor/autoload.php';

You're done.

Usage

The OpenAlboPretorio class is the entry point of the library.

<?php

    $albo = new OpenAlboPretorio();
    $results = $albo->city(AlboPretorioScraperFactory::TERME_VIGLIATORE);
                    ->open();

    print $results  // JSON format as default

You can also customize the scraper manually:

<?php

    $scraper = new BarcellonaPGScraper(new BarcellonaPGMasterPageScraper(), new BarcellonaPGDetailPageScraper());
    // you can also customize the Master and Detail scraper objects using i.e. different HttpAdapter objects

    $scraper->setItemType(BarcellonaPGScraper::TIPOLOGIA_DETERMINAZIONE_DEL_SINDACO);
    $formatter = new FeedFormatter(FeedFormatter::ATOM_FEED_TYPE);  // RSS2 default

    $albo = new OpenAlboPretorio();
    $results = $albo->scrapeUsing($scraper)
                    ->formatUsing($formatter)
                    ->maxNumberItems(10)
                    ->open();

    print $results;

The OpenAlboPretorio API

Alternatively to the city method you can set your customized scraper using

You can customized the scraped results with:

Scraper

Currently the following scrapers are implemented:

Formatter

Formatters available:

Extending the OpenAlboPretorio project

If you want to extend the Albo Pretorio project for your city you have to implement the AlboPretorioScraperInterface interface.

Generally scraping an Albo Pretorio Web page means extract data from two pages:

1) the Master page - the Web page containing the list of all Albo Pretorio items, i.e. all administrative acts of the Municipality, where you can find the summary of the last item published including the URL of each item;

2) the Detail page - the single Web page item where you can find the detail of each administrative act.

To manage correctly the above described scraping logic the OpenAlboPretorio library provides the AbstractMasterDetailTemplateScraper abstract class implementing the AlboPretorioScraperInterface interface.

The abstract class uses the following scraper interfaces:

Obviously if the extraction logic of your Albo Pretorio Web page is different from the Master-Detail you are free to implement the one that meets your needs.

Requirements

Running the Tests

$ phpunit

Demo

TODO

Credits

License

OpenAlboPretorio is released under the MIT License. See the bundled LICENSE file for details.