pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
307 stars 447 forks source link

Introduce REST API #1503

Closed asmecher closed 7 years ago

asmecher commented 8 years ago

Introduce a REST API for OJS 3.0/OMP 1.2+ (at the earliest).

asmecher commented 8 years ago

Some early experimentation over at https://github.com/asmecher/pkp-lib/commit/e54010a7b7718004685b9fb6ceeb986d5ba1d83f:

ctgraham commented 8 years ago

+1, watching.

mjordan commented 8 years ago

@asmecher about authen/z, would using Slim's middleware be an option? The example I pointed to uses it at https://github.com/mjordan/ocr_rest/blob/master/index.php#L23 to do basic IP whitelisting.

Not to complicate matters, but another PHP microframework that might be worth evaluating is Silex, which is a Symphony component.

asmecher commented 8 years ago

@mjordan, we have our own rulesets for authorization/authentication, and porting these over to another framework will be a colossal pain. I want to decrease the amount of homebrew infrastructure we maintain but would advocate for baby steps -- introducing something like slim that doesn't dupe a lot of it, but uses modern/standard approaches, is a step in the right direction.

I'll assess this as I go -- we might discover a fundamental impedance mismatch when we try to use the auth framework from within the microframework -- but for now it seems like an easy win.

NateWr commented 8 years ago
  1. In terms of auth, what are we able to re-use when hitting the API from a third-party client (say, a different domain or from a mobile app)? We'll probably need an oauth implementation before we can support any POST/PATCH/PUT/DELETE requests, and oauth2 is far simpler than oauth1. I think it's worth requiring SSL (oauth2 requirement) since we can impose some up-front standards on this thing.
  2. I know it's early, but can we version that endpoint (/api/v1)? ;)
  3. What about [index.php]/api and [index.php]/api/context/{contextPath}?
asmecher commented 8 years ago

What about [index.php]/api and [index.php]/api/context/{contextPath}?

I'm not sure that will play well with URL rewriting.

kaschioudi commented 7 years ago

@asmecher : I have done some refactoring to move authorization to a Middleware. Then add a generic getParameter to APIHandler to replace getEntityId. cf. https://github.com/kaschioudi/pkp-lib/commit/cc49926aace45c5c37650c6556126ce8f00f5bc3

kaschioudi commented 7 years ago

rest api post/put/delete endpoints work PR: https://github.com/pkp/pkp-lib/pull/2622

asmecher commented 7 years ago

@kaschioudi, is there one also for the OJS repo?

kaschioudi commented 7 years ago

@asmecher : not at the moment. Once I do some refactoring based on your comments and move IssueRepository out of pkp-lib into OJS, I will create a pull request.

asmecher commented 7 years ago

OK, sure. It's just a lot easier to post comments on a branch when there's a PR to attach them to, even if the PR isn't ready for merging.

NateWr commented 7 years ago

We've now (just about) got an API in place for OJS 3.1. @kaschioudi has done amazing work wrestling the API infrastructure into place. I want to give a brief overview of where we're at, then raise issues for further feedback and refinement before 3.1 ships.

Where we're at

Several endpoints are available at /index.php/api/v1/. Each endpoint is serviced by an APIHandler class which you'll find under /api/v1/ in the OJS directory.

We're documenting the API using a swagger file. The most up-to-date copy is this gist, but we'll be keeping it updated in master under /docs/dev/swagger.yaml. To read the documentation, dump the contents of the gist file into the swagger editor.

Any endpoint prefixed with _ is a private endpoint, which we use to service OJS 3 app components. It's recommended that you do not rely on these for custom code, as they may change without warning. (These endpoints are not fully documented yet.)

A comment below introduces authorization with the API.

Discussion

I'd like to get some discussion going on the following questions around the API's design, so that we can get things as refined as possible by the OJS 3.1 release. This will reduce the amount of breakage we have to do in the early days of the API.

API design questions (consumption)

1. Should objects associated with the primary request be embedded or linked?

Should a request for a submission (/submissions/{submissionId}) render out the associated authors, galleys, files, sections and participants in the response? Or should it provide a link to an API endpoint where it can be retrieved.

Example of embedding authors:

{
    "title": "Magna vivamus sagittis eget dictum nisl elit nam non iaculis",
    "authors": [
        {
            "name": "Carlo Corino",
            "affiliation": "University of Bologna",
            "orcid": "http://orcid.org/0000-0002-1825-0097"
        },
    ],
}

Example of linking authors:

{
    "title": "Magna vivamus sagittis eget dictum nisl elit nam non iaculis",
    "authors": [
        'http://localhost/api/v1/authors/1'
    ],
}

Embedding authors provides a richer set of information in a single API request, but will require more database hits per request. We're also more likely to get bogged down in questions about how deep the embedding should go and what should be contained in a summary presentation of an object.

2. What use-cases should we aim to support for the OJS 3.1 release?

Are there any client applications for the API that people want to start working on right now? Can we devise a set of minimum viable endpoints that we want to be polished and finalised by 3.1's release?

The answer to this will also bear on the question of linking vs embedding (1). If we presume someone wants to use the API to render a user-facing website, embedding might be more important. But if we think client apps are more likely to want to import the data into their own systems, linking will probably be easier to build, document, and maintain with long-term compatibility.

API design questions (code)

3. How can we service the official and private endpoints with as much shared code as possible?

We'll use the API for backend UI components in the future. How can we keep the code duplication to a minimum? The /_submissions endpoint is the first private endpoint that's had serious attention. It uses the new Service class to provide data to the endpoint.

It would be good to investigate the pattern used there, to see if it works for servicing the public /submissions endpoint. And whether that pattern can be extended to the other public endpoints. If not, what other patterns could we use to cut down on the code we're maintaining?

4. How can we make our API endpoints open to extension from plugins?

Where should plugins get to intervene? And can we devise a pattern that works across the board for all of our endpoints?

The SubmissionsService class uses a series of hooks to make it's getSubmissionList method extensible. We use this to add details specific to OJS or OMP to the submission objects that are returned. Can this pattern be extended to other object service classes and incorporated into the API? Are there better approaches we could use that generalise the intervention in the API a bit more?

Existing issues

The following are a list of issues that we should try to address in time for the 3.1 release.

NateWr commented 7 years ago

Discussion

1. I'm leaning towards linking over embedding. Although it means more work on the client side, I think it will be significantly easier to design and maintain the API if we keep the responses limited. How we do linking and reduce code duplication with the private endpoints will be a tricky problem to solve, though.

2. I'd love to hear more input from our dev community here. My guess is that there won't be big demand for building a UI on top of the API. I think the API's most common use-cases will be for importing data into a different app/database, so requiring multiple requests won't be as much of a concern.

Additional concerns/comments

The following is a list of suggestions/thoughts that came out of my first detailed look at the public endpoints.

General:

Submissions:

Issues:

Files:

Galleys:

asmecher commented 7 years ago

Thanks, @NateWr and @kaschioudi. On use cases and embedding vs. linking: an alternate front-end is probably the single most common use case. While that often arose because of limitations in the OJS 2.x UI, it won't go away entirely -- we still have users who have an existing website/toolset they want to use, pulling OJS data into it rather than trying to get OJS to match.

@NateWr, I usually wouldn't like this kind of approach, but what about a hybrid? Something like:

{
    "title": "Magna vivamus sagittis eget dictum nisl elit nam non iaculis",
    "authors": [
        {
            "name": "Carlo Corino",
            "affiliation": "University of Bologna",
            "orcid": "http://orcid.org/0000-0002-1825-0097"
            "href": "http://localhost/api/v1/authors/1"
        },
    ],
}

...where name, affiliation, and orcid are simple, unstructured, and presumably just enough to provide a basic summary alongside the article; and href provides an access point for richer metadata, e.g. in some cases tools to query metadata in a variety of scholar-friendly formats.

We would of course be subject to the usual demands for richer data in the simple query (thinking e.g. structured author names), which I'd suggest we fairly strictly decline.

kaschioudi commented 7 years ago

Thank you very much @NateWr for your valuable inputs.

Regarding object linking and embedding, I see them as additional features provided to ease hyperlink between resources (linking) and reduce the number of HTTP requests required (embedding). Ideally the API should support both features. Linking when supported is usually available by default whilst Embedding can be enabled for each request.

We could define a light and a full representation. When the entity is explicitly requested (e.g /authors/{authorId}) we send the full representation otherwise a light representation (e.g submission's author from /submissions/{submissionId}).

Let’s consider the following representations of author object:

Light

{
    "id": 1,
    "name": "John Doe"
}

Full

{
    "id": 1,
    "firstName": "John",
    "lastName": "Doe",
    "country": "Canada",
    "suffix": "",
    "email": "john@doe.com",
    "primarycontact": "555-456-9635",
    "affiliation": "University of Bologna",
    "orcid": "http://orcid.org/0000-0002-1825-0097"
}

Requesting /submissions/123 could be:

{
    "id": 123,
    "title": "Praesent semper ultrices magna, in ultrices ipsum commodo quis",
    "authors": [
        {
            "id": 1,
            "name": "John Doe",
            "href": "http://example.com/api/v1/authors/1"
        },
    ],
}

and /submissions/123?_embed would return

{
    "id": 123,
    "title": "Praesent semper ultrices magna, in ultrices ipsum commodo quis",
    "authors": [
        {
            "id": 1,
            "firstName": "John",
            "lastName": "Doe",
            "country": "Canada",
            "suffix": "",
            "email": "john@doe.com",
            "primarycontact": "555-456-9635",
            "affiliation": "University of Bologna",
            "orcid": "http://orcid.org/0000-0002-1825-0097"
        }
    ],
}
kaschioudi commented 7 years ago

@asmecher : I saw your comment after replying to @NateWr . I believe we both suggested the same hybrid approach. I am also suggesting to be able to embed full entity object if requested.

NateWr commented 7 years ago

an alternate front-end is probably the single most common use case... we still have users who have an existing website/toolset they want to use, pulling OJS data into it rather than trying to get OJS to match.

I think most will want to import to their system, rather than fetch from the API on-the-fly in their custom frontend. For instance, if I were building a WP integration, I'd want to pull the data and store it in the WP database. In my mind, this would still favor a linking over embedding approach.

I believe we both suggested the same hybrid approach. I am also suggesting to be able to embed full entity object if requested.

I had the same thought as both of you. I know the WP API uses an _embed parameter like you've suggested.

I'm a little nervous about code duplication in the process of converting objects into the JSON responses. Ideally, we'd have a pattern in which we can serve light and full data, as well as the backend endpoints, without necessarily writing separate code to handle each case. The toArray method in the SubmissionService class (here) was an attempt at such a pattern. Could this be re-used for the public API?

kaschioudi commented 7 years ago

API authentication and authorization inner working

There are 2 ways to authenticate requests:

Cookie based authentication is useful when an API endpoint is requested from a browser with an active/valid OJS session. For instance the new submissions list grids take advantage of that feature to fetch data from API backend endpoints. Other clients would lean on token based authentication using a valid access token. Please note that OJS users are able to obtain access token via the “API Key” tab from their profile page.

Behind the scene API Authentication’s is built on top of OJS authorization framework. After API’s handler bootstraps, there’s a middleware (ApiAuthorizationMiddleware) which intercepts all requests and performs authorization by calling the authorize function on the loaded handler. This means that each handler is in charge of making sure that requests are properly authorized. Handlers authorize requests using OJS policies.

kaschioudi commented 7 years ago

I would like to add that at the moment, there's a little bit of rules duplication when it comes to request authorization. It would be good if we could centralize these rules for reusability.

kaschioudi commented 7 years ago

I'm a little nervous about code duplication in the process of converting objects into the JSON responses. Ideally, we'd have a pattern in which we can serve light and full data, as well as the backend endpoints, without necessarily writing separate code to handle each case. The toArray method in the SubmissionService class (here) was an attempt at such a pattern. Could this be re-used for the public API?

I looked at toArray() implementation and it's a good attempt. Nonetheless my concern is that such helper function encapsulates all the logic in one place and does not provide flexibility. Since most entities in OJS are objects, we could look into decorator pattern.

We could have an abstract class like

abstract class EntityApiPropertyList {

    protected $fullList = null;

    public function __construct($fullList = false) {
        $this->fullList = $fullList;
    }

    public function propertyList() {
        if ($this->fullList)
            return $this->fullPropertyList();

        return $this->lightPropertyList();
    }

    abstract public function lightPropertyList();
    abstract public function fullPropertyList();
}

and have an implementation for each entity. for instance, Author's could be like

class AuthorEntityApiPropertyList extends EntityApiPropertyList {

    protected $author = null;

    public function __construct(\Author $author, $fullList = false) {
        parent::__construct($fullList);
        $this->author = $author;
    }

    public function lightPropertyList() {
        return [
            'id'    => $this->author->id,
            'name'  => $this->author->getFullName(),
        ];
    }

    public function fullPropertyList() {
        return [
            'id'    => $this->author->id,
            'name'  => $this->author->getFullName(),
            'orcid' => $this->author->getOrcid(),
            'email' => $this->author->getEmail(),
            ...
        ];
    }

}

This is how we would get an object representation

$props = with(new AuthorEntityApiPropertyList(
    $author, 
    $this->getQueryParam('_embed', false)
))->propertyList();

note: with() is helper function introduced by illuminate/database useful for chaining

Therefore entities linked to other entities could reuse their linked entities' formatter like

class SubmissionEntityApiPropertyList implements EntityApiPropertyList {

    protected $submission = null;

    public function __construct(\Submission $submission, $fullList = false) {
        parent::__construct($fullList);
        $this->submission = $submission;
    }

    public function lightPropertyList() {
        $props = [
            'id'    => $this->submission->getId(),
            'title' => $this->submission->getTitle(null),
        ];

        // first author
        $authors = $this->submission->getAuthors();
        $author_props = new AuthorEntityApiPropertyList($authors[0], $this->fullList);

        $props = array_merge($props, $author_props);
        return $props;
    }

    public function fullPropertyList() {
        // to be implemented
    }
}

This way modifying entities will cascade to other linked entities.

@asmecher @NateWr Let me know what you think.

NateWr commented 7 years ago

It may just be personal preference, but I'm hesitant to introduce a new tree of classes. I thought the point of introducing the Services pattern was to consolidate the object handling in a centralised location.

I'd be inclined to perform your approach above using the Services classes like this:

class SubmissionService extends Service {
    function getPropertyList($submission) {
        $props = [
            'id' => $submission->getId(),
            'authors' => \Services\AuthorService->getSubmissionAuthorPropertyList($submission),
            'reviewRounds' => \Services\ReviewRoundsService->getSubmissionRoundPropertyList($submission)
        ];
    }
}

Is that not how you see the Services classes being used?

Also, I'd rather turn getPropertyList into a more flexible method that can be used for light/full lists, as well as other configurations a third-party may desire. So that would look something like this:

class SubmissionService extends Service {
    function getPropertyList($submission, $params = array()) {
        $props = [
            'id' => $submission->getId(),
        ];

        if (!empty($params['authors'])) {
            $props['authors'] = \Services\AuthorService->getSubmissionAuthorPropertyList($submission);
        }

        if (!empty($params['reviewRounds'])) {
            $props['reviewRounds'] = \Services\ReviewRoundsService->getSubmissionRoundPropertyList($submission);
        }
    }
    // If we wanted shorthand functons, we can add those as an additional layer
    function getSummaryPropertyList($submission) {
        return $this->getPropertyList($submission, array('id' => true));
    }
    function getFullPropertyList($submission) {
        $params = array(
            'id' => true,
            'authors' => true,
            'reviewRounds' => true,
        );
        return $this->getPropertyList($submission, $params);
    }
}

This way, getPropertyList is it's own little API that isn't tied directly to the use-cases we're addressing right now, but instead just provides an interface for transposing associated information as needed.

Is there maybe some common ground here?

kaschioudi commented 7 years ago

I see the service layer as a place to define complex business logic or functions with interact with multitple entities. Nonetheless under the hood it's still a big class with a bunch of global functions. Therefore there is a risk of overuse with the consequence of ending up with very large unmanageable classes.

The other aspect I dislike with service methods is that they make it hard to extend functionnalities. For instance, once we define let's say SubmissionService::getPropertyList() and we start using it, if for whatever reason we need to modify implementation in a very specific scenario, we will have to modify this method (with the risk of affecting existing code) and add new rules whilst things will get more complicated for plugins.

But that being said, I am comfortable with either implementations.

NateWr commented 7 years ago

During the pkp2017 sprint, we discussed a way forward for generating properties for the API. Kassim is going to work on implementing that. I'm going to try to draw up a solid spec for the endpoints we want to ship in OJS 3.1 so that Kassim has a good idea of the end-goal and can start building out the endpoints.

Dulip did a lot of work drawing up OMP specs, which will come in handy when we get to that.

NateWr commented 7 years ago

I've drawn up a proposal for the API specification to ship OJS 3.1. I decided to work towards the minimum endpoints needed to port published articles to a third-party platform. I then added on some things that we'll need to eventually synchronise the backend and public endpoints.

A copy of the swagger file for this specification can be found in this pull request: https://github.com/pkp/ojs/pull/1484

The description of each endpoint outlines what features to include for 3.1, and additional notes indicating what kind of data is expected for specific keys.

I've also got the following questions:

asmecher commented 7 years ago

@NateWr, I've just merged more of the subscription stuff, moving from the back end towards the front end. Our access model looks kind of like this (carried over from OJS 2.x):

We support delayed open access, i.e. an issue can become open access x number of months after publication.

Subscriptions are individual or institutional. Individual subscriptions typically require login; institutional subscriptions are typically by domain or IP address.

I would suggest exposing the access status (attached to either issue or article) via the API for requests of both article and issue metadata.

Is any of that helpful?

NateWr commented 7 years ago

@asmecher Thanks yeah, so my understanding is:

Does the access status contain the information about whether a login is required? ie - is returning the access status sufficient to "understand" access?

Finally, should any of this information be "protected" from an unauthorized user if a submission is not open access?

asmecher commented 7 years ago

None of that should be protected, I would say. It's nothing more/less than we're already exposing via the web interface, with the exception of when that's turned off -- maybe we'll need to consider a second switch for the API, as some users will turn off the publishing front end in order to use the API in its place.

Your summary of the access information is accurate. You can almost get away with just fetching those using getters on the article/issue objects and just passing it along -- except that you'll also have to respect the journal's override (i.e. the "everything is open access" setting) as it'll override any subscription settings on individual objects.

asmecher commented 7 years ago

@kaschioudi, is this one ready to be closed, or are there still parts needing work before OJS 3.1 is released?

kaschioudi commented 7 years ago

@asmecher this is not ready yet. Before switching back to OTS plugin I was in the process of integrating the architecture patterns that we came out with during the sprint. https://github.com/kaschioudi/ojs/tree/api-property-list https://github.com/kaschioudi/pkp-lib/tree/api-property-list

asmecher commented 7 years ago

I'm getting a minor warning that could use some attention:

PHP Warning:  Declaration of APIRouter::url($request, $endpoint, $params) should be compatible with PKPRouter::url($request, $newContext = NULL, $handler = NULL, $op = NULL, $path = NULL, $params = NULL, $anchor = NULL, $escape = false) in /home/travis/build/pkp/omp/lib/pkp/classes/core/APIRouter.inc.php on line 24
[27-Sep-2017 21:01:09 UTC] PHP Stack trace:
[27-Sep-2017 21:01:09 UTC] PHP   1. {main}() /home/travis/build/pkp/omp/index.php:0
[27-Sep-2017 21:01:09 UTC] PHP   2. Application->execute() /home/travis/build/pkp/omp/index.php:64
[27-Sep-2017 21:01:09 UTC] PHP   3. Dispatcher->dispatch() /home/travis/build/pkp/omp/lib/pkp/classes/core/PKPApplication.inc.php:239
[27-Sep-2017 21:01:09 UTC] PHP   4. Dispatcher->_instantiateRouter() /home/travis/build/pkp/omp/lib/pkp/classes/core/Dispatcher.inc.php:96
[27-Sep-2017 21:01:09 UTC] PHP   5. instantiate() /home/travis/build/pkp/omp/lib/pkp/classes/core/Dispatcher.inc.php:179
[27-Sep-2017 21:01:09 UTC] PHP   6. import() /home/travis/build/pkp/omp/lib/pkp/includes/functions.inc.php:203
[27-Sep-2017 21:01:09 UTC] PHP   7. require_once() /home/travis/build/pkp/omp/lib/pkp/includes/functions.inc.php:25
NateWr commented 7 years ago

@kaschioudi It looks like APIRouter::url() is not actually used. Can we remove it?

https://github.com/pkp/pkp-lib/blob/master/classes/core/APIRouter.inc.php#L121-L130

kaschioudi commented 7 years ago

@NateWr : Sure.

NateWr commented 7 years ago

PR: https://github.com/pkp/pkp-lib/pull/2835

Tests: https://github.com/pkp/ojs/pull/1558

NateWr commented 7 years ago

I think we can now close this issue and any future API development can go into new issues. Congrats @kaschioudi this is a big one!