webyrd / mediKanren

Proof-of-concept for reasoning over the SemMedDB knowledge base, using miniKanren + heuristics + indexing.
MIT License
317 stars 53 forks source link

Medikanren2 string search #96

Closed jeffhhk closed 2 years ago

jeffhhk commented 3 years ago

In scope for this merge: 1) single db string search 2) rtx and semmed string-search indexes built and distributed (Uploaded to Box, named "...-w-stringindex" in medikanren2-data). 3) yeast-micro-sri-reference-kg data distributed but without string-search index built (for CI) 4) 27 new CI tests for string-search

Out of scope for this merge: 5) cross-db string search 6) HTTP endpoint for string search 7) reconciling sri-reference-kg with cprop/eprop/edge 8) distributing data for "reverse lookup" data for sri-reference-kg 9) optimized quicksort for 3x denser memory representation

To read this change I recommend first reading the commit "string-search: port copied code to mediKanren 2". By copy-pasting the code to be ported beforehand, the commit is able to distill only the code written for this particular change. After that, I would read the final code, to incorporate the final naming of things.

The commit "string-search: refactor: sort 8 byte arrays instead of integer pairs" is a performance-neutral change that was preparation for the yet incomplete performance improvement 9). It could be excluded from merging or later reverted if we wanted to move forward with the old pair of integers representation for a suffix.

gregr commented 2 years ago

Sorry for the delay, looks fine to try this now.