Closed asmecher closed 7 years ago
Some early experimentation over at https://github.com/asmecher/pkp-lib/commit/e54010a7b7718004685b9fb6ceeb986d5ba1d83f:
index.php
wrapper is still used for all requests. Maintains use of our session management, authorization framework, etc. toolsets.[index.php]/api
for site-wide API requests (e.g. "list all journals on this site")[index.php]/{contextPath}/api
for journal- or press-specific requests (most functionality will be here)api
will be superseded by the API router.+1, watching.
@asmecher about authen/z, would using Slim's middleware be an option? The example I pointed to uses it at https://github.com/mjordan/ocr_rest/blob/master/index.php#L23 to do basic IP whitelisting.
Not to complicate matters, but another PHP microframework that might be worth evaluating is Silex, which is a Symphony component.
@mjordan, we have our own rulesets for authorization/authentication, and porting these over to another framework will be a colossal pain. I want to decrease the amount of homebrew infrastructure we maintain but would advocate for baby steps -- introducing something like slim that doesn't dupe a lot of it, but uses modern/standard approaches, is a step in the right direction.
I'll assess this as I go -- we might discover a fundamental impedance mismatch when we try to use the auth framework from within the microframework -- but for now it seems like an easy win.
/api/v1
)? ;)[index.php]/api
and [index.php]/api/context/{contextPath}
?What about [index.php]/api and [index.php]/api/context/{contextPath}?
I'm not sure that will play well with URL rewriting.
@asmecher : I have done some refactoring to move authorization to a Middleware. Then add a generic getParameter to APIHandler to replace getEntityId. cf. https://github.com/kaschioudi/pkp-lib/commit/cc49926aace45c5c37650c6556126ce8f00f5bc3
rest api post/put/delete endpoints work PR: https://github.com/pkp/pkp-lib/pull/2622
@kaschioudi, is there one also for the OJS repo?
@asmecher : not at the moment. Once I do some refactoring based on your comments and move IssueRepository out of pkp-lib into OJS, I will create a pull request.
OK, sure. It's just a lot easier to post comments on a branch when there's a PR to attach them to, even if the PR isn't ready for merging.
We've now (just about) got an API in place for OJS 3.1. @kaschioudi has done amazing work wrestling the API infrastructure into place. I want to give a brief overview of where we're at, then raise issues for further feedback and refinement before 3.1 ships.
Several endpoints are available at /index.php/api/v1/
. Each endpoint is serviced by an APIHandler
class which you'll find under /api/v1/
in the OJS directory.
We're documenting the API using a swagger file. The most up-to-date copy is this gist, but we'll be keeping it updated in master
under /docs/dev/swagger.yaml
. To read the documentation, dump the contents of the gist file into the swagger editor.
Any endpoint prefixed with _
is a private endpoint, which we use to service OJS 3 app components. It's recommended that you do not rely on these for custom code, as they may change without warning. (These endpoints are not fully documented yet.)
A comment below introduces authorization with the API.
I'd like to get some discussion going on the following questions around the API's design, so that we can get things as refined as possible by the OJS 3.1 release. This will reduce the amount of breakage we have to do in the early days of the API.
1. Should objects associated with the primary request be embedded or linked?
Should a request for a submission (/submissions/{submissionId}
) render out the associated authors, galleys, files, sections and participants in the response? Or should it provide a link to an API endpoint where it can be retrieved.
Example of embedding authors:
{
"title": "Magna vivamus sagittis eget dictum nisl elit nam non iaculis",
"authors": [
{
"name": "Carlo Corino",
"affiliation": "University of Bologna",
"orcid": "http://orcid.org/0000-0002-1825-0097"
},
],
}
Example of linking authors:
{
"title": "Magna vivamus sagittis eget dictum nisl elit nam non iaculis",
"authors": [
'http://localhost/api/v1/authors/1'
],
}
Embedding authors provides a richer set of information in a single API request, but will require more database hits per request. We're also more likely to get bogged down in questions about how deep the embedding should go and what should be contained in a summary presentation of an object.
2. What use-cases should we aim to support for the OJS 3.1 release?
Are there any client applications for the API that people want to start working on right now? Can we devise a set of minimum viable endpoints that we want to be polished and finalised by 3.1's release?
The answer to this will also bear on the question of linking vs embedding (1). If we presume someone wants to use the API to render a user-facing website, embedding might be more important. But if we think client apps are more likely to want to import the data into their own systems, linking will probably be easier to build, document, and maintain with long-term compatibility.
3. How can we service the official and private endpoints with as much shared code as possible?
We'll use the API for backend UI components in the future. How can we keep the code duplication to a minimum? The /_submissions
endpoint is the first private endpoint that's had serious attention. It uses the new Service class to provide data to the endpoint.
It would be good to investigate the pattern used there, to see if it works for servicing the public /submissions
endpoint. And whether that pattern can be extended to the other public endpoints. If not, what other patterns could we use to cut down on the code we're maintaining?
4. How can we make our API endpoints open to extension from plugins?
Where should plugins get to intervene? And can we devise a pattern that works across the board for all of our endpoints?
The SubmissionsService
class uses a series of hooks to make it's getSubmissionList
method extensible. We use this to add details specific to OJS or OMP to the submission objects that are returned. Can this pattern be extended to other object service classes and incorporated into the API? Are there better approaches we could use that generalise the intervention in the API a bit more?
The following are a list of issues that we should try to address in time for the 3.1 release.
getData()
calls often pull from a database where they're stored as strings. We should fix these in the getter methods of objects everywhere possible.1. I'm leaning towards linking over embedding. Although it means more work on the client side, I think it will be significantly easier to design and maintain the API if we keep the responses limited. How we do linking and reduce code duplication with the private endpoints will be a tricky problem to solve, though.
2. I'd love to hear more input from our dev community here. My guess is that there won't be big demand for building a UI on top of the API. I think the API's most common use-cases will be for importing data into a different app/database, so requiring multiple requests won't be as much of a concern.
The following is a list of suggestions/thoughts that came out of my first detailed look at the public endpoints.
General:
id
instead of issueId
for issues and submissionId
for submissions./submissions/<id>/files/
: The file component information should be added. This will also be necessary to know what is included in the metadata.Submissions:
/submissions/<id>
returns a metadata format in JSON. Submission metadata schemas should be available at a separate endpoint. The response format of /submissions/<id>
should be consistent between published and unpublished submissions, with keys missing or set to null
if needed./submissions/<id>
that isn't available./submissions/<id>
should match what's presented in /submissions/<id>/galleys/
.Issues:
current
key should be isCurrent
? What about making all booleans use this grammatical form?volume
, number
, and year
filtering is? Would it be better to filter by published status, maybe a published date range?_data
key. That could be broken down into an object specifically for the API.Files:
/download
? We might want a /files
endpoint to return data about a file.Galleys:
$submissionFile
may be empty after the $galley->getFile()
request. Need some error handling here./files
needs a base path added, as it will not work if OJS is located in a subdirecetory (eg - domain.com/ojs/index.php
).Thanks, @NateWr and @kaschioudi. On use cases and embedding vs. linking: an alternate front-end is probably the single most common use case. While that often arose because of limitations in the OJS 2.x UI, it won't go away entirely -- we still have users who have an existing website/toolset they want to use, pulling OJS data into it rather than trying to get OJS to match.
@NateWr, I usually wouldn't like this kind of approach, but what about a hybrid? Something like:
{
"title": "Magna vivamus sagittis eget dictum nisl elit nam non iaculis",
"authors": [
{
"name": "Carlo Corino",
"affiliation": "University of Bologna",
"orcid": "http://orcid.org/0000-0002-1825-0097"
"href": "http://localhost/api/v1/authors/1"
},
],
}
...where name
, affiliation
, and orcid
are simple, unstructured, and presumably just enough to provide a basic summary alongside the article; and href
provides an access point for richer metadata, e.g. in some cases tools to query metadata in a variety of scholar-friendly formats.
We would of course be subject to the usual demands for richer data in the simple query (thinking e.g. structured author names), which I'd suggest we fairly strictly decline.
Thank you very much @NateWr for your valuable inputs.
Regarding object linking and embedding, I see them as additional features provided to ease hyperlink between resources (linking) and reduce the number of HTTP requests required (embedding). Ideally the API should support both features. Linking when supported is usually available by default whilst Embedding can be enabled for each request.
We could define a light and a full representation. When the entity is explicitly requested (e.g /authors/{authorId}
) we send the full representation otherwise a light representation (e.g submission's author from /submissions/{submissionId}
).
Let’s consider the following representations of author
object:
Light
{
"id": 1,
"name": "John Doe"
}
Full
{
"id": 1,
"firstName": "John",
"lastName": "Doe",
"country": "Canada",
"suffix": "",
"email": "john@doe.com",
"primarycontact": "555-456-9635",
"affiliation": "University of Bologna",
"orcid": "http://orcid.org/0000-0002-1825-0097"
}
Requesting /submissions/123
could be:
{
"id": 123,
"title": "Praesent semper ultrices magna, in ultrices ipsum commodo quis",
"authors": [
{
"id": 1,
"name": "John Doe",
"href": "http://example.com/api/v1/authors/1"
},
],
}
and /submissions/123?_embed
would return
{
"id": 123,
"title": "Praesent semper ultrices magna, in ultrices ipsum commodo quis",
"authors": [
{
"id": 1,
"firstName": "John",
"lastName": "Doe",
"country": "Canada",
"suffix": "",
"email": "john@doe.com",
"primarycontact": "555-456-9635",
"affiliation": "University of Bologna",
"orcid": "http://orcid.org/0000-0002-1825-0097"
}
],
}
@asmecher : I saw your comment after replying to @NateWr . I believe we both suggested the same hybrid approach. I am also suggesting to be able to embed full entity object if requested.
an alternate front-end is probably the single most common use case... we still have users who have an existing website/toolset they want to use, pulling OJS data into it rather than trying to get OJS to match.
I think most will want to import to their system, rather than fetch from the API on-the-fly in their custom frontend. For instance, if I were building a WP integration, I'd want to pull the data and store it in the WP database. In my mind, this would still favor a linking over embedding approach.
I believe we both suggested the same hybrid approach. I am also suggesting to be able to embed full entity object if requested.
I had the same thought as both of you. I know the WP API uses an _embed
parameter like you've suggested.
I'm a little nervous about code duplication in the process of converting objects into the JSON responses. Ideally, we'd have a pattern in which we can serve light and full data, as well as the backend endpoints, without necessarily writing separate code to handle each case. The toArray
method in the SubmissionService
class (here) was an attempt at such a pattern. Could this be re-used for the public API?
There are 2 ways to authenticate requests:
Cookie based authentication is useful when an API endpoint is requested from a browser with an active/valid OJS session. For instance the new submissions list grids take advantage of that feature to fetch data from API backend endpoints. Other clients would lean on token based authentication using a valid access token. Please note that OJS users are able to obtain access token via the “API Key” tab from their profile page.
Behind the scene API Authentication’s is built on top of OJS authorization framework. After API’s handler bootstraps, there’s a middleware (ApiAuthorizationMiddleware
) which intercepts all requests and performs authorization by calling the authorize function on the loaded handler. This means that each handler is in charge of making sure that requests are properly authorized. Handlers authorize requests using OJS policies.
I would like to add that at the moment, there's a little bit of rules duplication when it comes to request authorization. It would be good if we could centralize these rules for reusability.
I'm a little nervous about code duplication in the process of converting objects into the JSON responses. Ideally, we'd have a pattern in which we can serve light and full data, as well as the backend endpoints, without necessarily writing separate code to handle each case. The toArray method in the SubmissionService class (here) was an attempt at such a pattern. Could this be re-used for the public API?
I looked at toArray()
implementation and it's a good attempt. Nonetheless my concern is that such helper function encapsulates all the logic in one place and does not provide flexibility. Since most entities in OJS are objects, we could look into decorator pattern
.
We could have an abstract class like
abstract class EntityApiPropertyList {
protected $fullList = null;
public function __construct($fullList = false) {
$this->fullList = $fullList;
}
public function propertyList() {
if ($this->fullList)
return $this->fullPropertyList();
return $this->lightPropertyList();
}
abstract public function lightPropertyList();
abstract public function fullPropertyList();
}
and have an implementation for each entity. for instance, Author
's could be like
class AuthorEntityApiPropertyList extends EntityApiPropertyList {
protected $author = null;
public function __construct(\Author $author, $fullList = false) {
parent::__construct($fullList);
$this->author = $author;
}
public function lightPropertyList() {
return [
'id' => $this->author->id,
'name' => $this->author->getFullName(),
];
}
public function fullPropertyList() {
return [
'id' => $this->author->id,
'name' => $this->author->getFullName(),
'orcid' => $this->author->getOrcid(),
'email' => $this->author->getEmail(),
...
];
}
}
This is how we would get an object representation
$props = with(new AuthorEntityApiPropertyList(
$author,
$this->getQueryParam('_embed', false)
))->propertyList();
note: with()
is helper function introduced by illuminate/database
useful for chaining
Therefore entities linked to other entities could reuse their linked entities' formatter like
class SubmissionEntityApiPropertyList implements EntityApiPropertyList {
protected $submission = null;
public function __construct(\Submission $submission, $fullList = false) {
parent::__construct($fullList);
$this->submission = $submission;
}
public function lightPropertyList() {
$props = [
'id' => $this->submission->getId(),
'title' => $this->submission->getTitle(null),
];
// first author
$authors = $this->submission->getAuthors();
$author_props = new AuthorEntityApiPropertyList($authors[0], $this->fullList);
$props = array_merge($props, $author_props);
return $props;
}
public function fullPropertyList() {
// to be implemented
}
}
This way modifying entities will cascade to other linked entities.
@asmecher @NateWr Let me know what you think.
It may just be personal preference, but I'm hesitant to introduce a new tree of classes. I thought the point of introducing the Services pattern was to consolidate the object handling in a centralised location.
I'd be inclined to perform your approach above using the Services classes like this:
class SubmissionService extends Service {
function getPropertyList($submission) {
$props = [
'id' => $submission->getId(),
'authors' => \Services\AuthorService->getSubmissionAuthorPropertyList($submission),
'reviewRounds' => \Services\ReviewRoundsService->getSubmissionRoundPropertyList($submission)
];
}
}
Is that not how you see the Services classes being used?
Also, I'd rather turn getPropertyList
into a more flexible method that can be used for light/full lists, as well as other configurations a third-party may desire. So that would look something like this:
class SubmissionService extends Service {
function getPropertyList($submission, $params = array()) {
$props = [
'id' => $submission->getId(),
];
if (!empty($params['authors'])) {
$props['authors'] = \Services\AuthorService->getSubmissionAuthorPropertyList($submission);
}
if (!empty($params['reviewRounds'])) {
$props['reviewRounds'] = \Services\ReviewRoundsService->getSubmissionRoundPropertyList($submission);
}
}
// If we wanted shorthand functons, we can add those as an additional layer
function getSummaryPropertyList($submission) {
return $this->getPropertyList($submission, array('id' => true));
}
function getFullPropertyList($submission) {
$params = array(
'id' => true,
'authors' => true,
'reviewRounds' => true,
);
return $this->getPropertyList($submission, $params);
}
}
This way, getPropertyList
is it's own little API that isn't tied directly to the use-cases we're addressing right now, but instead just provides an interface for transposing associated information as needed.
Is there maybe some common ground here?
I see the service layer as a place to define complex business logic or functions with interact with multitple entities. Nonetheless under the hood it's still a big class with a bunch of global functions. Therefore there is a risk of overuse with the consequence of ending up with very large unmanageable classes.
The other aspect I dislike with service methods is that they make it hard to extend functionnalities. For instance, once we define let's say SubmissionService::getPropertyList()
and we start using it, if for whatever reason we need to modify implementation in a very specific scenario, we will have to modify this method (with the risk of affecting existing code) and add new rules whilst things will get more complicated for plugins.
But that being said, I am comfortable with either implementations.
During the pkp2017 sprint, we discussed a way forward for generating properties for the API. Kassim is going to work on implementing that. I'm going to try to draw up a solid spec for the endpoints we want to ship in OJS 3.1 so that Kassim has a good idea of the end-goal and can start building out the endpoints.
Dulip did a lot of work drawing up OMP specs, which will come in handy when we get to that.
I've drawn up a proposal for the API specification to ship OJS 3.1. I decided to work towards the minimum endpoints needed to port published articles to a third-party platform. I then added on some things that we'll need to eventually synchronise the backend and public endpoints.
A copy of the swagger file for this specification can be found in this pull request: https://github.com/pkp/ojs/pull/1484
The description of each endpoint outlines what features to include for 3.1, and additional notes indicating what kind of data is expected for specific keys.
I've also got the following questions:
doi
key) or should we generate all pubId details automatically under a pubId
key?/submission/{id}/metadata/{format}
endpoint to extract metadata in a given format?/download
endpoint?tableOfContents
in issues? The approach I've used seems closely tied to the existing UI, where articles are sorted into sections. But I can see a future where people want alternatives to how articles are sorted. Should we leave this out for 3.1 until we think about this problem?@NateWr, I've just merged more of the subscription stuff, moving from the back end towards the front end. Our access model looks kind of like this (carried over from OJS 2.x):
We support delayed open access, i.e. an issue can become open access x number of months after publication.
Subscriptions are individual or institutional. Individual subscriptions typically require login; institutional subscriptions are typically by domain or IP address.
I would suggest exposing the access status (attached to either issue or article) via the API for requests of both article and issue metadata.
Is any of that helpful?
@asmecher Thanks yeah, so my understanding is:
Does the access status contain the information about whether a login is required? ie - is returning the access status sufficient to "understand" access?
Finally, should any of this information be "protected" from an unauthorized user if a submission is not open access?
None of that should be protected, I would say. It's nothing more/less than we're already exposing via the web interface, with the exception of when that's turned off -- maybe we'll need to consider a second switch for the API, as some users will turn off the publishing front end in order to use the API in its place.
Your summary of the access information is accurate. You can almost get away with just fetching those using getters on the article/issue objects and just passing it along -- except that you'll also have to respect the journal's override (i.e. the "everything is open access" setting) as it'll override any subscription settings on individual objects.
@kaschioudi, is this one ready to be closed, or are there still parts needing work before OJS 3.1 is released?
@asmecher this is not ready yet. Before switching back to OTS plugin I was in the process of integrating the architecture patterns that we came out with during the sprint. https://github.com/kaschioudi/ojs/tree/api-property-list https://github.com/kaschioudi/pkp-lib/tree/api-property-list
I'm getting a minor warning that could use some attention:
PHP Warning: Declaration of APIRouter::url($request, $endpoint, $params) should be compatible with PKPRouter::url($request, $newContext = NULL, $handler = NULL, $op = NULL, $path = NULL, $params = NULL, $anchor = NULL, $escape = false) in /home/travis/build/pkp/omp/lib/pkp/classes/core/APIRouter.inc.php on line 24
[27-Sep-2017 21:01:09 UTC] PHP Stack trace:
[27-Sep-2017 21:01:09 UTC] PHP 1. {main}() /home/travis/build/pkp/omp/index.php:0
[27-Sep-2017 21:01:09 UTC] PHP 2. Application->execute() /home/travis/build/pkp/omp/index.php:64
[27-Sep-2017 21:01:09 UTC] PHP 3. Dispatcher->dispatch() /home/travis/build/pkp/omp/lib/pkp/classes/core/PKPApplication.inc.php:239
[27-Sep-2017 21:01:09 UTC] PHP 4. Dispatcher->_instantiateRouter() /home/travis/build/pkp/omp/lib/pkp/classes/core/Dispatcher.inc.php:96
[27-Sep-2017 21:01:09 UTC] PHP 5. instantiate() /home/travis/build/pkp/omp/lib/pkp/classes/core/Dispatcher.inc.php:179
[27-Sep-2017 21:01:09 UTC] PHP 6. import() /home/travis/build/pkp/omp/lib/pkp/includes/functions.inc.php:203
[27-Sep-2017 21:01:09 UTC] PHP 7. require_once() /home/travis/build/pkp/omp/lib/pkp/includes/functions.inc.php:25
@kaschioudi It looks like APIRouter::url()
is not actually used. Can we remove it?
https://github.com/pkp/pkp-lib/blob/master/classes/core/APIRouter.inc.php#L121-L130
@NateWr : Sure.
I think we can now close this issue and any future API development can go into new issues. Congrats @kaschioudi this is a big one!
Introduce a REST API for OJS 3.0/OMP 1.2+ (at the earliest).