Open gigamorph opened 2 months ago
@gigamorph,
Is there any interest in leveraging the authentication work you already did within LUX's middle tier and then continuing to use a service account into MarkLogic?
Are you planning to store this data in a separate database from LUX's content (JSON-LD)? There's advantages to storing them in the same database but we'd have to be careful not to lose this new data when we reload the JSON-LD.
cc: @clarkepeterf
@brent-hartwig,
The direction we are taking after the team meeting where @azaroth42 and @clarkepeterf were present among others, is that we want to use MarkLogic as both data store and API provider for My Collections, using the AWS Cognito authentication, which is essentially OIDC/OAuth2.
Besides whether MarkLogic can support this flow, which we can take advantage of the middle tier as you suggested in case it doesn't work, @clarkepeterf and I have identified another problem. We need this PRD "My Collections" database/API up and running constantly (with minimal downtime, with appropriate notification to users) with real time currency of user-initiated updates, which doesn't jive at all with our current blue/green deployment scheme where we load up a complete set of non-PRD data that is not affected by user actions at all, and then send them into the PRD environment.
Before discussing this "currency" or "synchronization" issue in the team meeting, we did want to tap you for any insights you may have regarding it, too.
@gigamorph, there are several MarkLogic features to keep a couple databases in sync, including database replication, flexible replication, scheduled tasks, and backup/restore.
We'll want to keep a few things in mind while comparing them and any others that come up:
Here's a comparison of ML's two replication types: https://docs.marklogic.com/guide/database-replication/dbrep_intro#id_92346. Spoiler alert: If you want the new docs in the existing LUX content database, database replication is out.
With regard to automatically synchronizing after the target database comes back on line, I didn't quickly find documentation on how database replication handles this scenario, but expect it would. For flexible replication, your content processing framework (CPF) pipeline would have to account for it. A scheduled task could wake up every minute or so, maintain a last-sync timestamp and play catch up when needed. Backup and restore could also be employed whereby full backups could be created frequently but only restored during the switch.
With regard to edits in close proximity to the switch, it could be tricky for both replication types and the backup/restore route. The schedule task approach may support this scenario best. Let's say Green just became PROD. After all manual switching is otherwise complete, we can tell Blue to stop sending its docs to Green and tell Green to start sending its docs to Blue. I'd recommend a script or Gradle task that ensures the schedule task fires one last time before disabling in one environment and enabling in another.
Glancing at Yale's ML license, the license allows for all of the above-mentioned features.
I'm happy to run this by home base for validation and/or other options.
Opened a ML support ticket for Cognito auth flow: https://help.marklogic.com/Tickets/Ticket/View/37337
Key points about authentication:
Some key settings for OAuth in MarkLogic Admin:
External Security:
REST App Server:
Role (lux-endpoint-consumer):
Sample request with curl:
curl -i -H "Authorization: Bearer ${TOKEN}" ${URL}"
where ${TOKEN}
is the access token obtained from Cognito after login, and ${URL} is the MarkLogic endpoint, e.g. http://localhost:8003/ds/lux/advancedSearchConfig.mjs
My notes from our 30 Sep meeting plus subsequent thoughts and requirement clarification:
@clarkepeterf and @gigamorph, I changed the status of this ticket from Forming to In Progress because it is labeled as a research ticket (and research is underway). What do you consider necessary to complete this ticket? I propose once we deem it technically feasible (no known obstacles) plus a draft list of backend implementation tasks --tasks that could become implementation tickets. I'd also like to introduce a label for this feature, such as "my collections".
cc: @prowns, @jffcamp, @roamye
@brent-hartwig Submitted the "idea" for the JWKS URI feature at https://progressdataplatform.ideas.aha.io/ideas/ML-I-75
Action items from a meeting with @gigamorph:
lux_service_acc_[env]
).While waiting for the JWKS URI feature to be implemented, we may need to employ a workaround, to automatically keep the JWKS public key configuration current in ML.
All requests from the middle tier is currently sharing a single MarkLogicClient
to access an ML port. Under the OAuth scheme, it seems we need to create a new client instance for every request. Since it is all HTTP REST calls in the lower level anyway, I think it shouldn't have any significant hit on performance. Hopefully that is the case.
@gigamorph, I don't know how much overhead there is in creating DatabaseClient instances either but we may be able to call setAuthToken
on an existing DatabaseClient instance when the middle tier request includes such a token. Here's the DatabaseClient's API documentation: https://docs.marklogic.com/jsdoc/DatabaseClient.html.
Created an app server in DEV at port 8007, named "lux-request-group-oauth-experiment".
Config: https://lux-ml-dev.collections.yale.edu:8000/manage/v2/servers/lux-request-group-oauth-experiment/properties?format=json&group-id=Default -> I don't think there's any sensitive information in the config.
External Security was added - named "chit-cognito-experiment", and external roles are mapped to the "lux-endpoint-consumer" role, of which "lux-service" represents the service account that middle tier uses, and "lux-users" all other users.
You can access the frontend at https://lux-front-exp.collections.yale.edu. If you're signed in, MarkLogic will see the signed in user. If not, MarkLogic will see username for the service account.
Overall, the current PoC is using the Cognito service that I have created in our own (CHIT) account. Thus you need to ask me to create an account if you want to experiment with logging in to the frontend.
Library Cognito service, it turns out, is not returning groups for non-CAS users, so it doesn't work with the current code of PoC.
Thanks for the update, @gigamorph. Are there any current blockers? I think we're okay so long as Cognito authenticates the user and provides the token that ML can then validate and allow the request in. If that's the case and we go by a service account naming convention, the backend code can then decide if the user can partake in My Collections functionality.
@brent-hartwig - no blockers comes to my mind currently, but then I have no idea how MarkLogic will utilize the token it received, and what information it will require (username, groups, and/or ?).
@gigamorph, would you configure a Cognito account for me and let me know what group I'm in. This should boil down to assigning an external store's group to a MarkLogic role.
@gigamorph, it looks like all the pieces are in place. When I log into https://lux-front-exp.collections.yale.edu/ using the account you set up for me, these entries appear in 8007_AccessLog.txt:
External User(4498b4b8-00b1-7094-c28a-c6436671ce2a) is Mapped to Temp User(4498b4b8-00b1-7094-c28a-c6436671ce2a) with Role(s): lux-endpoint-consumer 10.5.157.158 - 4498b4b8-00b1-7094-c28a-c6436671ce2a [22/Nov/2024:16:27:46 +0000] "POST /ds/lux/stats.mjs HTTP/1.1" 200 251 - -
Pieces/steps:
Because the xdmp.getRequestUser* functions return values such as follows, our user naming convention idea won't work. Instead, we could create two roles that extend lux-endpoint-consumer, associating one to lux-service and one to lux-users, and then only support My Collection functionality on the latter.
2024-11-22 16:45:17.501 Info: User name: 4498b4b8-00b1-7094-c28a-c6436671ce2a
2024-11-22 16:45:17.501 Info: User id: 9053464274652740159
A way to restrict My Collection functionality by role is to grant one of the roles a new execute privilege then requiring that privilege using https://docs.marklogic.com/xdmp.hasPrivilege or https://docs.marklogic.com/xdmp.securityAssert within the My Collection entry points / code base.
We may find it necessary or desirable to grant the lux-endpoint-consumer additional privileges. For instance, using xdmp.userRoles requires the http://marklogic.com/xdmp/privileges/xdmp-user-roles privilege, which the lux-endpoint-consumer role does not presently have.
Problem Description
The team has decided to implement LUX-specific "My Collections" with MarkLogic and its API, instead of having a standalone application.
Expected Behavior/Solution
Requirements
Questions
Problems/Possible Blockers
Related links
MarkLogic Documentation
MarkLogic Support Tickets