Open gigamorph opened 1 month ago
@gigamorph,
Is there any interest in leveraging the authentication work you already did within LUX's middle tier and then continuing to use a service account into MarkLogic?
Are you planning to store this data in a separate database from LUX's content (JSON-LD)? There's advantages to storing them in the same database but we'd have to be careful not to lose this new data when we reload the JSON-LD.
cc: @clarkepeterf
@brent-hartwig,
The direction we are taking after the team meeting where @azaroth42 and @clarkepeterf were present among others, is that we want to use MarkLogic as both data store and API provider for My Collections, using the AWS Cognito authentication, which is essentially OIDC/OAuth2.
Besides whether MarkLogic can support this flow, which we can take advantage of the middle tier as you suggested in case it doesn't work, @clarkepeterf and I have identified another problem. We need this PRD "My Collections" database/API up and running constantly (with minimal downtime, with appropriate notification to users) with real time currency of user-initiated updates, which doesn't jive at all with our current blue/green deployment scheme where we load up a complete set of non-PRD data that is not affected by user actions at all, and then send them into the PRD environment.
Before discussing this "currency" or "synchronization" issue in the team meeting, we did want to tap you for any insights you may have regarding it, too.
@gigamorph, there are several MarkLogic features to keep a couple databases in sync, including database replication, flexible replication, scheduled tasks, and backup/restore.
We'll want to keep a few things in mind while comparing them and any others that come up:
Here's a comparison of ML's two replication types: https://docs.marklogic.com/guide/database-replication/dbrep_intro#id_92346. Spoiler alert: If you want the new docs in the existing LUX content database, database replication is out.
With regard to automatically synchronizing after the target database comes back on line, I didn't quickly find documentation on how database replication handles this scenario, but expect it would. For flexible replication, your content processing framework (CPF) pipeline would have to account for it. A scheduled task could wake up every minute or so, maintain a last-sync timestamp and play catch up when needed. Backup and restore could also be employed whereby full backups could be created frequently but only restored during the switch.
With regard to edits in close proximity to the switch, it could be tricky for both replication types and the backup/restore route. The schedule task approach may support this scenario best. Let's say Green just became PROD. After all manual switching is otherwise complete, we can tell Blue to stop sending its docs to Green and tell Green to start sending its docs to Blue. I'd recommend a script or Gradle task that ensures the schedule task fires one last time before disabling in one environment and enabling in another.
Glancing at Yale's ML license, the license allows for all of the above-mentioned features.
I'm happy to run this by home base for validation and/or other options.
Opened a ML support ticket for Cognito auth flow: https://help.marklogic.com/Tickets/Ticket/View/37337
Key points about authentication:
Some key settings for OAuth in MarkLogic Admin:
External Security:
REST App Server:
Role (lux-endpoint-consumer):
Sample request with curl:
curl -i -H "Authorization: Bearer ${TOKEN}" ${URL}"
where ${TOKEN}
is the access token obtained from Cognito after login, and ${URL} is the MarkLogic endpoint, e.g. http://localhost:8003/ds/lux/advancedSearchConfig.mjs
My notes from our 30 Sep meeting plus subsequent thoughts and requirement clarification:
@clarkepeterf and @gigamorph, I changed the status of this ticket from Forming to In Progress because it is labeled as a research ticket (and research is underway). What do you consider necessary to complete this ticket? I propose once we deem it technically feasible (no known obstacles) plus a draft list of backend implementation tasks --tasks that could become implementation tickets. I'd also like to introduce a label for this feature, such as "my collections".
cc: @prowns, @jffcamp, @roamye
@brent-hartwig Submitted the "idea" for the JWKS URI feature at https://progressdataplatform.ideas.aha.io/ideas/ML-I-75
Action items from a meeting with @gigamorph:
lux_service_acc_[env]
).While waiting for the JWKS URI feature to be implemented, we may need to employ a workaround, to automatically keep the JWKS public key configuration current in ML.
All requests from the middle tier is currently sharing a single MarkLogicClient
to access an ML port. Under the OAuth scheme, it seems we need to create a new client instance for every request. Since it is all HTTP REST calls in the lower level anyway, I think it shouldn't have any significant hit on performance. Hopefully that is the case.
@gigamorph, I don't know how much overhead there is in creating DatabaseClient instances either but we may be able to call setAuthToken
on an existing DatabaseClient instance when the middle tier request includes such a token. Here's the DatabaseClient's API documentation: https://docs.marklogic.com/jsdoc/DatabaseClient.html.
Problem Description
The team has decided to implement LUX-specific "My Collections" with MarkLogic and its API, instead of having a standalone application.
Expected Behavior/Solution
Requirements
Questions
Problems/Possible Blockers
Related links
MarkLogic Documentation
MarkLogic Support Tickets