Closed versar closed 6 years ago
We need to convert the python loop over models to a single SQL lookup:
dependence probability: https://github.com/probcomp/bayeslite/blob/39e96af568d8f11528583b263042b4711e0cd5ff/src/metamodels/loom_metamodel.py#L500-L502
row similarity: https://github.com/probcomp/bayeslite/blob/39e96af568d8f11528583b263042b4711e0cd5ff/src/metamodels/loom_metamodel.py#L556-L570
After re-running the benchmarks @versar reports that the candidate fix in feb1e22 appears to not have improved the runtime. The overhead of Loom might be related to reading data from disk versus memory, so we may consider caching results if this indeed is the case.
Confirmed issue for both cases (and this will be the same for all Loom queries): the culprit is the invocation of _check_loom_initialized
on a per-query basis. This check good error messages but unreasonable computational overhead.
Resolution plan:
0d80bf2 should contain the resolution for this issue (tested on the same benchmarks in the first post). @versar can reopen the ticket if they encounter more issues.
Note that 0d80bf2 also fixes #586 so that INITIALIZE actually initializes models using single streaming pass through the data.
I compared the performance of queries between loom crosscat and the default baseline (lovcat) crosscat. The queries
ESTIMATE SIMILARITY
andESTIMATE DEPENDENCE
using loom take longer. The difference is on the order of n_models (a.k.a. number ofANALYSES
), but not exactly.For example, here below are runtimes from identical workflows I performed with loom and the default crosscat. The runtime differences are similar with multiprocess on, and multiprocess off.
n_models = 32:
ESTIMATE SIMILARITY
with 300 variables: 3.91 seconds default crosscat, 69.9 seconds loomESTIMATE DEPENDENCE
with 300 variables: 300 seconds default crosscat, waited a long time then interrupted loomESTIMATE DEPENDENCE
with 30 variables: 0.22 seconds default crosscat, 9.8 seconds loomn_models = 16:
ESTIMATE SIMILARITY
with 300 variables: .63 seconds default crosscat, 29 seconds loomESTIMATE DEPENDENCE
with 300 variables: 148 seconds default crosscat, waited a long time then interrupted loomESTIMATE SIMILARITY
with 30 variables: .62 seconds default crosscat, 5.3 seconds loomESTIMATE DEPENDENCE
with 30 variables: .17 seconds default crosscat, 5.0 seconds loomn_models = 2:
ESTIMATE DEPENDENCE
with 30 variables: 0.10 seconds default crosscat, .84 seconds loomESTIMATE SIMILARITY
with 30 variables: .23 seconds default crosscat, .93 seconds loomESTIMATE DEPENDENCE
with 300 variables: 0.10 seconds default crosscat, .84 seconds loomESTIMATE SIMILARITY
with 300 variables: .23 seconds default crosscat, .93 seconds loom