pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
297 stars 443 forks source link

Detect on the frontend whether current user can "log in as" reviewer/participant #10290

Open jardakotesovec opened 1 month ago

jardakotesovec commented 1 month ago

Describe the issue

While working on Vue.js versions of reviewers and participants grid. One thing that I can't really determine on the frontend is permission to "login as" and "gossip" (Editorial notes).

Screenshot 2024-08-08 at 15 14 03

Login as

Reason is that this logic is way too complicated and requires additional data, which we even would not want to expose to frontend.

It starts with:

                !Validation::loggedInAs() &&
                $user->getId() != $userId &&
                Validation::getAdministrationLevel($userId, $user->getId()) === Validation::ADMINISTRATION_FULL

So first two are easy, just checking whether its not the current user and whether we are not "logged in as" already.

But the Validation::getAdministrationLevel is complex function. Can be found here.

Gossip (editorial notes)

This is lot more feasible to replicate on frontend - but would be also probably better to not duplicate that logic as its not trivial.

Its implemented here. It could follow same patterns as login as.

Solution options

Therefore we still have to calculate this on the backend.

Option 1 From frontend perspective, if this information (canLoginAs) would included along with other reviewer/participant data, that would make it super easy on frontend. But given that the getAdministrationLevel makes also additional database calls - there would be performance considerations. Arguably it would not be more intense than existing implementation, which is also calling this for every reviewer/participant thats being displayed. But probably not easy to avoid cascading db calls there.

Just for context, currently getting list of reviewers from submission.reviewAssignments that I have from submissions/{submissionId} endpoint, which also includes reviewer information such id or name.

And participants I am getting separately from the submissions/{submissionId}/participants.

Option 2 Decide on canLoginAs lazily - so we would show "Login as" option for all reviewers/participants. But once the loginAs would be initiated it would be checked and resulting in error if it does not pass the validation.

Option 3 Having separate endpoint to check it.. But that would just increase complexity on the frontend, instead of adding this information via mapping. So that would be my least favourite, but also possible if that would make it backend solution for some reason significantly easier.

ewhanson commented 1 month ago

Option 1 sounds like the best fit to me, even given the performance considerations. Since this is for the reviewer/participant grid, I imagine there would never be too many lists to the point where the database calls would case too large of an impact, but I wonder if the same approach were applied to a larger list (e.g. the user management page), would it cause significant slowdown?

I would worry that option 2 would provide a bad user experience if you can never tell ahead of time who you can "log in as."

jardakotesovec commented 1 month ago

@ewhanson Oki, thank you for looking into it. With option1 the tricky part is to make sure its calculated only when needed, because our endpoints are somewhat general purpose.

So for example I don't care about loginAs when fetching list of all submissions, even though I have very much the same reviewAssignments objects there as when I just fetch one particular submission.

For performance heavy fields I think using the 'include' pattern might be good fit?

Here is discussion trying to explore some options we have with rest api, where is bit more details.. - https://github.com/pkp/pkp-lib/discussions/9945

ewhanson commented 1 month ago

Thanks, I'll have a look at that discussion. And that does make sense to me.

I've also been doing a bit of looking into patterns like this and how best to fit them within REST API architecture, but it gets pretty messy quite quickly. This seems like the type of thing where GraphQL has some real advantages, but what I don't know yet is if there are good ways to apply that pattern to the kind of API we have right now.

Edit: I also think this is a growing issue with our API endpoints in general, so I think it's worth the time spent researching how to include/exclude more optional information as a lot of the internal API consumers have very specialized needs.

jardakotesovec commented 1 month ago

@ewhanson Yeah, its not easy. If graphql would be first-class citizen in laravel, than maybe we could tap into it. But since its not the case, it would probably result in bringing significant amount of new dependencies and looking for good patterns how to blend it with our system. And graphql of course has its own challenges as well.

And our rest api works pretty well for things like managing files, participants, reviewers and other elements etc.. Where we need to do something bit more crazy to serve our frontend is for example the submission listing, which is currently served by private endpoint _submissions. Because we need to performantly include lots of information that user can directly see in submission listing - and fetching these things individually would not be great UX.

So maybe if we just overcome couple of these special cases we will be fine.. :-)

jardakotesovec commented 1 month ago

@ewhanson Also to note that there are situations in OJS, where its actually better fit to make multiple REST requests and having some components more independent from each other. These are scenarios where graphql, which ultimately is intended to collect all data requirements for the whole screen and make one request, would not be used to its potential.

For example on the workflow page we show different tables, showing things like participants, reviewers, files, discussions. And we mix and match these based on the stage and the role. And having good api that let me fetch these things individually (each components fetches what needs) works great, so these components can be easily mix&match as needed and they can be almost completely independent from each other, which is helping to keep things simple.

jardakotesovec commented 1 month ago

btw here is example of using the "include" query params for performance sensitive data on very large API - https://docs.gitlab.com/ee/api/merge_requests.html#get-single-mr

jardakotesovec commented 1 month ago

And last note - maybe we could introduce /submissions/{submissionId}/reviewAssignments endpoint, which could have this query flag to include loginAs. And same thing for existing endpoint /submissions/{submissionId}/participants

In terms of the naming not 100% sure, it can be either reviewAssignments and stageAssignemts or reviewers and participants :-)

ewhanson commented 1 month ago

@jardakotesovec, completely agree on your points above, the calling of multiple REST endpoints in our use case, etc. I don't actually think GraphQL fits well with our use case, there are just some nice features of it that got me thinking.

On the note of the include query params, thanks for sharing the Gitlab example. That's exactly the type of thing I was looking for. This has got me thinking. What are your thoughts on these two different ideas:

  1. The same as the existing Gitlab implementation, i.e. different attributes for each include item, e.g. an attribute for include_diverged_commits_count and a separate attribute for include_rebase_in_progress.
  2. A singular include attribute that accepts an array of possible values, e.g. include=['diverged_commits_count', 'rebase_in_progress'].

I think option one would be easier to follow and use as a user if there are relatively few optional includes, but if that number grows, I think option 2 would be less overwhelming for users. This may also open up a conversation about whether or not any of the things that are returned by the API by default could be made optional as well. I'm not thinking of anything specific at the moment, but if there are any performance intensive items to retrieve that aren't needed in all cases, they could be candidates for making part of an optional include array as well.

Finally, for the new API endpoint, I think something like what you suggest sounds good, e.g. /submissions/{submissionId}/reviewers. I think reviewers makes sense in the context of the API and fits with the existing participants. It also captures what the API endpoint is about more succinctly. I think adding the include loginAs for both of those endpoints seems like a solid approach.

ewhanson commented 1 month ago

Another thing I just thought of re: the two include options above. Option 1 would make it easier to pass additional information like an id, e.g. include_stage_assignment={stageAssignmentId} as an attribute.

jardakotesovec commented 1 month ago

@ewhanson Agree on using Option 1. These include query params should be pretty rare. So we should not end up in situation when there is too many of them.

In most cases I think we should just include "reasonable" amount of the data to make endpoint useful, without calling lots of other endpoints to understand the data. Which obviously sometime is difficult balance to do. But I don't think there is simple silver bullet rule that we could follow. So we will need to iteratively aim for it based on the use cases.

@Vitaliy-1 started already using good pattern, which is good balance between performance and flexibility when fetching related data for multiple entities, without making large sql queries - but basically by doing 2 simple sql query and than connect the entities in php (ideally via eloquant). It has been discussed here in past - https://github.com/pkp/pkp-lib/discussions/9394 . But @Vitaliy-1 would be best to point you to some real examples he already implemented in case you have not come across them already.

jardakotesovec commented 1 month ago

@ewhanson I discovered very similar use case, where on Reviewers Grid we have "Editorial Notes" action.

This is lot more feasible to replicate on frontend - but would be also probably better to not duplicate that logic as its not trivial.

Its implemented here. Would probably make sense to handle this one along with the loginAs. So is ok to just extend description of this issue or do you prefer to have it in separate issue?

ewhanson commented 1 month ago

@jardakotesovec, I think it's probably okay to extend the description of this issue.

jardakotesovec commented 3 weeks ago

@ewhanson Description adjusted. Any suggestion who could pick it up? Its quite high priority for submission listing.

jardakotesovec commented 2 weeks ago

@ewhanson Having second thoughts about this one.

As we were reviewing some of the user&role invitations stuff it came to my attention that we have the same Login As option for user table as well. That would be example where we have lots of items to list.

And even if we put this behind the flag - performance hit does not go away. And we want to step by step reduce number of "per item sql queries" for listings. As these can hinder the performance quite quickly. There is multiple sql queries per user and my understanding is that these all happens in serial. So for listing 20 users - we can easily hit 60 queries all happening in serial, just for the "login as" thing.

There are ways to reduce these queries, but in this particular case I suspect it can be tricky to optimise it and make sure its accurate. But feel free to convince me otherwise on this.

So my suggestion would be middleground, where on client side I check for easy thing, which I believe will cover 95% of the cases. 1) !Validation::loggedInAs() && 2) $user->getId() != $userId 3) current user has role ROLE_ID_SITE_ADMIN or ROLE_ID_MANAGER

As the other checks are going more in depth on some edge cases and I suspect these will be quite rare - so can be checked lazily (after that action is clicked).

Tagging @asmecher @Vitaliy-1 for some sanity checking on this suggestion.

asmecher commented 2 weeks ago

I'll run a database test against SciELO to see whether fundamentally a performant check is out of bounds (barring something like a flag column).

jardakotesovec commented 2 weeks ago

@asmecher Alright, but try it with ROLE_ID_MANAGER role.. not the ROLE_ID_SITE_ADMIN ;-)

And also what is likely to make difference is network latency from sql server, which will be very small on localhost.

asmecher commented 1 week ago

Here's how we could get the "does X manage all journals these users are active in" in a single query:

-- Variables
SET @MANAGER_USER_ID = 3;
SET @JOURNAL_PATH = 'publicknowledge';

-- Constants
SET @ROLE_ID_MANAGER = 16;

-- Make sure we don't cache results when testing performance
RESET QUERY CACHE;

SELECT user_id,
-- *** This part checks the manager roles! ***
CASE WHEN EXISTS (
    -- Find all roles for the user
    SELECT ug.context_id
    FROM user_user_groups uug
    JOIN user_groups ug ON (uug.user_group_id = ug.user_group_id)
    WHERE u.user_id = uug.user_id
        AND ug.context_id NOT IN (
            -- List of all contexts that the manager manages
            SELECT ug.context_id
            FROM user_groups ug
            JOIN user_user_groups uug ON (ug.user_group_id = uug.user_group_id)
            WHERE ug.role_id = @ROLE_ID_MANAGER AND uug.user_id = @MANAGER_USER_ID
       )
)
THEN 0 ELSE 1 END AS manages_all_relevant_journals
FROM users u
-- *** This part would be replaced by our other query conditions ***
WHERE u.user_id IN (
    SELECT uug.user_id
    FROM user_user_groups uug
    JOIN user_groups ug ON (uug.user_group_id = ug.user_group_id)
    JOIN journals j ON (ug.context_id = j.journal_id)
    WHERE j.path = @JOURNAL_PATH
);

Using the SciELO database, it gives me 2300 results in <200ms. If I LIMIT 50 like we'll be doing for a page or so of users, it drops to 0.024 seconds -- pretty much negligible even on a big database. So I think this will perform well enough for our needs.