webyrd / mediKanren

Proof-of-concept for reasoning over the SemMedDB knowledge base, using miniKanren + heuristics + indexing.
MIT License
317 stars 53 forks source link

LRU cache for curie lookup #61

Open jeffhhk opened 3 years ago

jeffhhk commented 3 years ago

Benchmark exercise:

    (cd medikanren && racket)
        (require xrepl)
        (require "db.rkt")
        (require "common.rkt") (load-databases #t)
        (time (enter! "../contrib/medikanren/use-cases/PMI-20-10-Il1R1-case-reviews.rkt"))  ; this is the time measurement shown

Database list:

    covid19
    orange
    pr-owl
    robokop
    rtx
    semmed
    sri_semmeddb
    textminingprovider
    sri-reference-kg-0.2.0
    rtx2_2020_09_16
    co-occur
    umlsmeta

Configurations:

    1) 5GB cui cache (2m11s startup time)
        (in-memory-names? . #f)
        (in-memory-cuis?  . #t)
        (num-cached-cuis . #f)
            cpu time: 60928 real time: 61051 gc time: 3053
    2) 1MB cui cache (32s startup time)
        (in-memory-names? . #f)
        (in-memory-cuis?  . #f)
        (num-cached-cuis . 3000)
            cpu time: 75705 real time: 75830 gc time: 3516
    3) No cui cache (32s startup time)
        (in-memory-names? . #f)
        (in-memory-cuis?  . #f)
        (num-cached-cuis . #f)
            cpu time: 212676 real time: 213121 gc time: 4044

The current pull request is to merge with LRU caching with no change to default behavior. That is, config.defaults.scm is configuration 1).

After merging, I propose another team member use configuration similar to 2) for a few days, and if successful, make it default.

Note that this PR includes #59 because #59 fixes the benchmark exercise above.

gregr commented 3 years ago

Looks like Thi's changes were included here by mistake.