probcomp / bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
http://probcomp.csail.mit.edu/software/bayesdb
Apache License 2.0
922 stars 64 forks source link

TABLE bayesdb_loom_rowid_mapping makes inappropriate uniqueness assumptions #624

Closed Schaechtle closed 6 years ago

Schaechtle commented 6 years ago

The following table (find it here) makes incorrect assumptions regarding uniqueness:

CREATE TABLE bayesdb_loom_rowid_mapping (
    generator_id        INTEGER NOT NULL REFERENCES bayesdb_generator(id),
    table_rowid         INTEGER NOT NULL,
    loom_rowid          INTEGER NOT NULL,
    PRIMARY KEY (generator_id, table_rowid)
    UNIQUE(generator_id, table_rowid),
    UNIQUE(generator_id, loom_rowid),
    UNIQUE(table_rowid, loom_rowid)
);

One can have 2 populations that both have generator_id 1 and table_rowid 1, thus the first UNIQUE is incorrect.

Suggested fix: remove uniqueness constraint and update PRIMARY KEY to be:

PRIMARY KEY (generator_id, table_rowid, loom_rowid)

and adjust the FOREIGN KEY in bayesdb_loom_row_kind_partition accordingly.

fsaad commented 6 years ago

One can have 2 populations that both have generator_id 1 and table_rowid 1, thus the first UNIQUE is incorrect.

That does not sound right, since the generator_id is unique, so it cannot be shared by two populations.

I believe it is the final constraint UNIQUE(table_rowid, loom_rowid) that needs to be removed, since it disallows the following (perfectly legal) table:

| generator_id  | table_rowid   | loom_rowid  |
| ------------- |:-------------:| -----------:|
| 1             |             1 |           1 |
| 2             |             1 |           1 |
Schaechtle commented 6 years ago

correct. I misread my own edits to the code. Sorry about that. If applied to the correct uniqueness constraint the fix works.