parsonsmatt / parsonsmatt.github.io

My Github pages website
Other
77 stars 25 forks source link

Question about "Sum Types in SQL" #64

Open dnjackson opened 1 year ago

dnjackson commented 1 year ago

Hi Matt, A question about an old post: why does the foreign key in the cat table (eg) include the animal type? Aren't the ids in the animal table unique anyway? I thought maybe it was so you could insert into the cat table and then add the corresponding tuple in the animal table, but you've marked the ids in the animal table as serial, so presumably that means you intend tuples to be added their first? Is it just for safety (which would make sense)?

Thanks for a nice post!

parsonsmatt commented 1 year ago

Hello!

In the shared primary key approach, we have animal with a primary key that consists of the integer and the type. So a foreign key to the animal table must include both the integer and the type. But I think that's just punting your question: why does animal have the type in the primary key? Couldn't it be omitted?

CREATE TABLE animal (
  id SERIAL PRIMARY KEY,
  type animal_constr NOT NULL
);

CREATE TABLE cat (
  id INTEGER REFERENCES animal (id),
  -- no need for a type column
);

CREATE TABLE dog (
  id INTEGER REFERENCES animal (id),
  -- no need for a type column
);

Well, there's a potential problem. Consider this:

INSERT INTO animal (type) VALUES ('cat'), ('cat'), ('dog');

INSERT INTO cat (id) SELECT id FROM animal;
INSERT INTO dog (id) SELECT id FROM animal;

Now we've got dog records which point to an animal record which says it has type cat! And vice versa. And worse, we have a dog and a cat for each entry in animal. So this schema represents a sort of many-to-one subtyping, where a single animal could have multiple types. This isn't what we want to represent, and would be more appropriate if we instead stored, say, attributes of an animal, and not a specific single subtype of an animal.

By storing the type in the primary key, we're ensuring that the child tables must specify the type. With the DEFAULT and CHECK constraints, we can ensure that the child tables are properly referencing the parent table, such that we don't have duplicate entries.