probcomp / bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
Apache License 2.0
922 stars 64 forks source link

Strange failure in test suite when removing code for caching the engine #541

Closed fsaad closed 7 years ago

fsaad commented 7 years ago

When changing CGPM_Metamodel to not use an engine cache and instead always load the engine from disk (which is what cache.patch does in the example session below), then test_cgpm.test_cgpm_smoke fails: a BQL query which specifies LIMIT 1 ends up returning 2 results. Strange!

$ git checkout 50af2c99446aa55df38c5a8e3c0cea2d53b816f2
$ ./ -k test_cgpm_smoke tests/
==== 1 passed, 8 deselected in 2.03 seconds ========
$ cat cache.patch 
diff --git a/src/metamodels/ b/src/metamodels/
index 6aa09b9..39ad955 100644
--- a/src/metamodels/
+++ b/src/metamodels/
@@ -696,8 +696,8 @@ class CGPM_Metamodel(IBayesDBMetamodel):
     def _engine(self, bdb, generator_id):
         # Probe the cache.
         cache = self._cache(bdb)
-        if cache is not None and generator_id in cache.engine:
-            return cache.engine[generator_id]
+        # if cache is not None and generator_id in cache.engine:
+        #     return cache.engine[generator_id]

         # Not cached.  Load the engine from the database.
         cursor = bdb.sql_execute('''
$ git apply cache.patch
$ ./ -k test_cgpm_smoke tests/
bdb = <bayeslite.bayesdb.BayesDB object at 0x7f9411378e10>, gen = None, vars = ['output', 'cat', 'input']

    def cgpm_smoke_tests(bdb, gen, vars):
        modelledby = 'MODELLED BY %s' % (gen,) if gen else ''
        for var in vars:
            ''' % (var, modelledby)).fetchall()
                SIMULATE %s FROM p %s LIMIT 1
            ''' % (var, modelledby)).fetchall()
                INFER %s FROM p %s LIMIT 1
            ''' % (var, modelledby)).fetchall()
            nvars = len(bdb.execute('''
                ESTIMATE * FROM VARIABLES OF p %(modelledby)s
                    ORDER BY PROBABILITY OF
                        (MUTUAL INFORMATION WITH %(var)s USING 1 SAMPLES > 0.1)
            ''' % {'var': var, 'modelledby': modelledby}).fetchall())
            if 0 < nvars:
                c = bdb.execute('''
                    SIMULATE p.(ESTIMATE * FROM VARIABLES OF p %(modelledby)s
                                    ORDER BY PROBABILITY OF
                                        (MUTUAL INFORMATION WITH %(var)s
                                            USING 1 SAMPLES > 0.1))
                        FROM p
                        LIMIT 1
                ''' % {'var': var, 'modelledby': modelledby}).fetchall()
>               assert len(c) == 1
E               assert 2 == 1
E                +  where 2 = len([('-1', 2.7210143958848385, 0.5285247869242387), ('1', 5.794596701327366, 4.578269369037486)])
==== 1 failed, 8 deselected in 2.03 seconds ========

@riastradh-probcomp You wrote the test in question -- any idea why it fails when removing the engine cache? Also, what does caching the engine have to do with LIMIT 1 returning 2 rows?

riastradh-probcomp commented 7 years ago

Dunno, the first thing I would do is examine the SQL trace to see what SQL queries are actually getting executed.

fsaad commented 7 years ago

Turned out to be a massively stochastic bug which is unrelated to engine caching.