pkp / ots

PKP XML Parsing Service
GNU General Public License v3.0
32 stars 19 forks source link

PKP XML Parsing Service

Module Description

 

Requirements

 

Installation

# git clone https://github.com/pkp/xmlps.git # cd xmlps

# php composer.phar self-update # php composer.phar install

# chmod -R go+w var # rm var/cache/zfcache-ea/*

# vendor/doctrine/doctrine-module/bin/doctrine-module orm:schema-tool:update --force

# ./start_queues.sh

 

Sample sites-available/httpd.conf:

 

<VirtualHost *:80>
    ServerAdmin webmaster@localhost

    DocumentRoot /var/www/html/public
    <Directory />
        Options FollowSymLinks
        AllowOverride All
    </Directory>
    <Directory /var/www/html/public>
        Options -Indexes +FollowSymLinks +MultiViews
        AllowOverride All
        Order allow,deny
        allow from all
    </Directory>

    ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
    <Directory "/usr/lib/cgi-bin">
        AllowOverride None
        Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
        Order allow,deny
        Allow from all
    </Directory>

    ErrorLog ${APACHE_LOG_DIR}/error.log

    # Possible values include: debug, info, notice, warn, error, crit,
    # alert, emerg.
    LogLevel warn

    CustomLog ${APACHE_LOG_DIR}/access.log combined

    Alias /doc/ "/usr/share/doc/"
    <Directory "/usr/share/doc/">
        Options Indexes MultiViews FollowSymLinks
        AllowOverride None
        Order deny,allow
        Deny from all
        Allow from 127.0.0.0/255.0.0.0 ::1/128
    </Directory>

</VirtualHost>

 

Note that you will probably want to make sure that port 8080 is blocked from non-localhost connections so that external submissions can not be made directly to the Grobid module service.

 

Unit tests

After a successful installation the unit tests should complete without errors:

# ./unittest.sh

You will need to re-empty the cache directories afterward, if your Web server runs as a different user than you.

 

Developer notes

# guard

 

API

There is a simple REST API available to submit, view and retrieve jobs from/to the server.

Submit

Submit a job to the server. The citationStyleHash is an internal identifier for the requested citaton style. A list of hashes can be retrieved through the citationStyleList API. The API will return the job id which can be used to retrieve the completed job later or to query the server for the job status.

E.g.:

http://example.com/api/job/submit
POST parameters:
    'email' => 'user@example.com'
    'access_token' => 'access_token'
    'fileName' => 'document.docx'
    'citationStyleHash' => 'c6de5efe3294b26391ea343053c19a84',
    'fileContent' => '...'
    'fileMetadata' => *OPTIONAL* known good metadata like https://raw.githubusercontent.com/pkp/xmlps/master/module/MergeXMLOutputs/test/assets/metadata.json 

Example response:

{"status":"success","id":123}

Status

Returns the current status for a job. Only completed jobs can be retrieved from the server. A full list of statuses can be found here.

E.g.:

http://example.com/api/job/status?email=user@example.com&access_token=access_token&id=123

Example response:

{"status":"success","jobStatus":0,"jobStatusDescription":"Pending"}

Citation Style List

Returns a list of available citation styles and their internal ids. We support all citation styles from citationstyles.org.

E.g.:

http://example.com/api/job/citationStyleList

Example response:

{"status":"success","citationStyles":{"c6de5efe3294b26391ea343053c19a84":"ACM SIG Proceedings (\u0022et al.\u0022 for 15+ authors)"...

Retrieve

Retrieve a converted document. The jobConversionStage parameter specifies which type of conversion you want to get returned. A full list of conversion stages can be found here. The "final" XML produced by our pipeline is stage 20.

E.g.:

http://example.com/api/job/retrieve?email=user@example.com&access_token=access_token&id=123&conversionStage=10

Example response:

The requested document or a JSON string with an error message.