Open clarkepeterf opened 4 months ago
@clarkepeterf - agenda. Unsure where this ticket should go.
Discussed in team meeting on 2024/08/07 - try crawling GitHub issues and grab records that were talked about in issues
May have problems with some records mentioned in issues because their URIs change over time
@clarkepeterf, I have a 593K dataset locally that Rob provided associated to data slices (#73). It only has YCBA and YUAG data yet is better than the one from 2022. Undoubtedly, there will be shortcomings but let me know if you'd like to give it a shot while this ticket is in the backlog.
As far as creating a new dataset, if we were given the opportunity and deemed better than getting from the pipeline, we could come up with a way to get a representative sample from a full dataset. I'm thinking some of each entity (combination of statically defined, well-connected records plus a specified number of random-ish ones) and connected records by specified predicates or all in the records found and keep going n hops. If we developed such a script, we could refresh our dataset as new datasets are created/updated. The script would take a while to run so we'd want to encapsulate it in CoRB, scheduled tasks, or possibly Flux.
cc: @prowns, @jffcamp, @azaroth42, @kkdavis14
From 8/7 team meeting notes: Scope of set: looking for records that have lots of permutations. Every type of document, Hit every index, Use all MT HAL links
@azaroth42 , @kkdavis14 and @prowns to make a first pass at features/records. @brent-hartwig -is there is a clever ML solution for this?
@prowns, I'm not aware of a feature or "database crawler" intended to export a subset of a dataset whereby the records are interconnected. As described above, I believe we could write such code and use it over and over again, as the dataset changes.
Problem Description: The current test dataset is very old and is missing many of the properties that are used in current indexes. So many searches and test cases do not work unless extra documents are imported.
Expected Behavior/Solution: Create an updated test dataset with properties that match current indexes
Requirements:
Needed for promotion:
- [ ] Wireframe/Mockup - Mike- [ ] Committee discussions - Sarah- [ ] Feasibility/Team discussion - Sarah- [ ] Backend requirements - TBD- [ ] Frontend requirements- TBD- [ ] Are new regression tests required for QA - Amy- [ ] Questions- List of questions for discussions. Answers should be documented within the issue.UAT/LUX Examples:
Dependencies/Blocks:
- Blocked By: Issues that are blocking the completion of the current issue.- Blocking: Issues being blocked by the completion of the current issue.Related Github Issues:
- Issues that contain similar work but are not blocking or being blocked by the current issue.Related links:
- These links can consist of resources, bugherds, etc.Wireframe/Mockup:
Place wireframe/mockup for the proposed solution at end of ticket.