Open clarkepeterf opened 2 months ago
if the current dataset passes the test (hasn't strayed too much from the previous dataset) - should the current dataset become the new baseline?
For the comparison part, perhaps the script could:
Do we want to restrict the endpoint to the current user (instead of an admin specifying to run as a specific user?) QA can then just call the endpoint with each user they intend to test (they don't even need creds, could just call through the middle tier which has the appropriate credentials for each environment)
If we want a non-admin to be able to run this, yes, it would need to be limited to the endpoint consumer. That check would likely be for the full LUX endpoint consumer, which is to all documents but wouldn't help us know if the dataset was trying to slight a unit. I'm open to either way --we can change if needed.
Actually, the script has a configuration section. If that was turned into endpoint parameters, it could support both. Endpoint consumers that do not have the ability to invoke code as another user better only specify their username :)
@brent-hartwig My thinking is that the API can be called can as any endpoint consumer - then QA could run the script as lux-endpoint-consumer
for the full dataset, and as lux-ypm-endpoint-consumer
for a slice -respectively. Just have to keep baselines per-slice
@brent-hartwig Also, I was thinking ML would just spit out the numbers for each predicate. QA would handle the comparison on their side. But I'm happy to implement the comparison in ML too.
Yeah, I guess this is a good example of security over convenience. We could also provide a service account to QA that has the ability to invoke code as another user yet is not a full admin.
There are additional data validation-related scripts available. We could consider introducing the endpoint as a generic dataset reporting endpoint, adding more checks in as we go. Here are some of the others that came to mind. Those with enough value can be incorporated as part of this ticket or another.
All that come to mind:
TF 11/20: Discussed validation - questions about running internally vs. QA. Agenda for 11/22 team mtg.
Problem Description: Datasets can be invalid based on missing or incorrectly applied triples. There is no current way to do this automatically. We want to validate datasets based on the number of documents containing each triple predicate in an automated way.
Expected Behavior/Solution: Create an API endpoint to return the number of documents containing each triple predicate (call into https://github.com/project-lux/lux-marklogic/blob/release1.24/scripts/checkPredicates.js) Requirements:
Needed for promotion: If an item on the list is not needed, it should be crossed off but not removed.
- [ ] Wireframe/Mockup - Mike- [ ] Committee discussions - Sarah- [ ] Feasibility/Team discussion - Sarah- [ ] Backend requirements - TBD- [ ] Frontend requirements- TBD- [ ] Are new regression tests required for QA - AmyUAT/LUX Examples:
Dependencies/Blocks:- Blocked By: Issues that are blocking the completion of the current issue.- Blocking: Issues being blocked by the completion of the current issue.Related Github Issues:- Issues that contain similar work but are not blocking or being blocked by the current issue.Related links:- These links can consist of resources, bugherds, etc.Wireframe/Mockup:Place wireframe/mockup for the proposed solution at end of ticket.