tiangolo / sqlmodel

SQL databases in Python, designed for simplicity, compatibility, and robustness.
https://sqlmodel.tiangolo.com/
MIT License
13.56k stars 612 forks source link

How to Initialise & Populate a Postgres Database with Circular ForeignKeys? #313

Open AyrtonB opened 2 years ago

AyrtonB commented 2 years ago

First Check

Commit to Help

Example Code

# Imports
from typing import Optional, List
from sqlmodel import Session, Field, SQLModel, Relationship, create_engine
import uuid as uuid_pkg

# Defining schemas
class Person(SQLModel, table=True):
    person_id: uuid_pkg.UUID = Field(default_factory=uuid_pkg.uuid4, primary_key=True, index=True, nullable=True)
    first_names: str
    last_name: str
    mailing_property_id: uuid_pkg.UUID = Field(foreign_key='property.property_id')
    customer: Optional['Customer'] = Relationship(back_populates='lead_person')
    mailing_property: Optional['Property'] = Relationship(back_populates='person')

class Customer(SQLModel, table=True):
    customer_id: uuid_pkg.UUID = Field(default_factory=uuid_pkg.uuid4, primary_key=True, index=True, nullable=True)
    lead_person_id: uuid_pkg.UUID = Field(foreign_key='person.person_id')
    contract_type: str
    lead_person: Optional['Person'] = Relationship(back_populates='customer')
    contracted_properties: Optional[List['Property']] = Relationship(back_populates='occupant_customer')

class Property(SQLModel, table=True):
    property_id: uuid_pkg.UUID = Field(default_factory=uuid_pkg.uuid4, primary_key=True, index=True, nullable=True)
    occupant_customer_id: uuid_pkg.UUID = Field(foreign_key='customer.customer_id')
    address: str
    person: Optional['Person'] = Relationship(back_populates='mailing_property')
    occupant_customer: Optional['Customer'] = Relationship(back_populates='contracted_properties')

# Initialising the database
engine = create_engine(f'postgresql://{DB_USERNAME}:{DB_PASSWORD}@{DB_URL}:{DB_PORT}/{DB_NAME}')
SQLModel.metadata.create_all(engine)

# Defining the database entries
john = Person(
    person_id = 'eb7a0f5d-e09b-4b36-8e15-e9541ea7bd6e',
    first_names = 'John',
    last_name = 'Smith',
    mailing_property_id = '4d6aed8d-d1a2-4152-ae4b-662baddcbef4'
)

johns_lettings = Customer(
    customer_id = 'cb58199b-d7cf-4d94-a4ba-e7bb32f1cda4',
    lead_person_id = 'eb7a0f5d-e09b-4b36-8e15-e9541ea7bd6e',
    contract_type = 'Landlord Premium'
)

johns_property_1 = Property(
    property_id = '4d6aed8d-d1a2-4152-ae4b-662baddcbef4',
    occupant_customer_id = 'cb58199b-d7cf-4d94-a4ba-e7bb32f1cda4',
    address = '123 High Street'
)

johns_property_2 = Property(
    property_id = '2ac15ac9-9ab3-4a7c-80ad-961dd565ab0a',
    occupant_customer_id = 'cb58199b-d7cf-4d94-a4ba-e7bb32f1cda4',
    address = '456 High Street'
)

# Committing the database entries
with Session(engine) as session:
    session.add(john)
    session.add(johns_lettings)
    session.add(johns_property_1)
    session.add(johns_property_2)
    session.commit()

Description

Goal: To model the back-end database for a cleaning company. Specifically, trying to model a system where customers can have multiple properties that need to be cleaned and each customer has a single lead person who has a single mailing property (to contact them at). Ideally, I want to be able to use a single table for the mailing properties and cleaning properties (as in most instances they will be the same).

Constraints:

The issue is that the foreign keys have a circular dependency.

image

Running the code written above results in:

ForeignKeyViolation: insert or update on table "customer" violates foreign key constraint "customer_lead_person_id_fkey"
DETAIL:  Key (lead_person_id)=(eb7a0f5d-e09b-4b36-8e15-e9541ea7bd6e) is not present in table "person".

This issue is specific to Postgres, which unlike SQLite (used in the docs) imposes constraints on foreign keys when data is being added. I.e. replacing engine = create_engine(f'postgresql://{DB_USERNAME}:{DB_PASSWORD}@{DB_URL}:{DB_PORT}/{DB_NAME}') with engine = create_engine('sqlite:///test.db') will let the database be initialised without causing an error - however my use-case is with a Postgres DB.


Attempted Solutions:

Operating System

macOS

Operating System Details

Using an M1 Mac but have replicated the issue on ubuntu as well

SQLModel Version

0.0.6

Python Version

3.10.4

Additional Context

No response

mkarbo commented 2 years ago

You can disable all triggers by running SET session_replication_role = replica; as part of your transaction (session), this will allow you to load. Alternatively, you can disable all foreign keys and indexes at start of transaction, load data, and then re-enable. This will also increase performance for load. (Don't do this for anything other than initial loads to circumvent these issues, you lose all integrity)

I use a middleware layer for creating transactions for initial loads, which handles dialect, though I ended up using psycopg2 and postgresql's copy instead of ORM for insert due to performance, cutting loads of 5 mil rows from +1 hour to few minutes.

eseglem commented 1 year ago

The issue lies mostly with how the tables need to be defined in PG. Circular foreign keys in PG they need to be initially deferred, or else you run into the issue of them not being defined like you see.

Using Field(foreign_key="") is just a baseline sqlalchemy.ForeignKey under the hood with no extra args. I believe you need to go back to defining the Column to do that, based on what I see in the code.

The following worked for me with the original second half:

import uuid
from typing import Optional, List

from pydantic import UUID4
from sqlmodel import Session, Field, SQLModel, Relationship, create_engine
from sqlalchemy import Column, ForeignKey
from sqlalchemy.dialects.postgresql import UUID

# Defining schemas
class Person(SQLModel, table=True):
    person_id: UUID4 = Field(
        default_factory=uuid.uuid4, primary_key=True, index=True, nullable=True
    )
    first_names: str
    last_name: str
    mailing_property_id: UUID4 = Field(
        sa_column=Column(
            UUID(as_uuid=True),
            ForeignKey(
                "property.property_id",
                onupdate="CASCADE",
                ondelete="RESTRICT",
                initially="DEFERRED",
                deferrable=True,
            ),
        )
    )

    customer: Optional["Customer"] = Relationship(back_populates="lead_person")
    mailing_property: Optional["Property"] = Relationship(back_populates="person")

class Customer(SQLModel, table=True):
    customer_id: UUID4 = Field(
        default_factory=uuid.uuid4, primary_key=True, index=True, nullable=True
    )
    lead_person_id: UUID4 = Field(
        sa_column=Column(
            UUID(as_uuid=True),
            ForeignKey(
                "person.person_id",
                onupdate="CASCADE",
                ondelete="RESTRICT",
                initially="DEFERRED",
                deferrable=True,
            ),
        )
    )
    contract_type: str

    lead_person: Optional["Person"] = Relationship(back_populates="customer")
    contracted_properties: Optional[List["Property"]] = Relationship(
        back_populates="occupant_customer"
    )

class Property(SQLModel, table=True):
    property_id: UUID4 = Field(
        default_factory=uuid.uuid4, primary_key=True, index=True, nullable=True
    )
    occupant_customer_id: uuid.UUID = Field(
        sa_column=Column(
            UUID(as_uuid=True),
            ForeignKey(
                "customer.customer_id",
                onupdate="CASCADE",
                ondelete="RESTRICT",
                initially="DEFERRED",
                deferrable=True,
            ),
        )
    )
    address: str

    person: Optional["Person"] = Relationship(back_populates="mailing_property")
    occupant_customer: Optional["Customer"] = Relationship(
        back_populates="contracted_properties"
    )

The values you use for the ForeignKey may be different as far as the onupdate and ondelete go, but the initially="DEFERRED", deferrable=True are what is important here.

And in case you weren't aware, you can use the relationship instead of directly using IDs.

johns_lettings = Customer(
    customer_id="cb58199b-d7cf-4d94-a4ba-e7bb32f1cda4",
    lead_person=john,  # instead of `lead_person_id`
    contract_type="Landlord Premium",
)

They still need to be deferred foreign keys, so this doesn't fix the issue. And there is always one of them that has to use an ID since it is circular.

The way to get around that would be to use an intermediate table. Untested, but something along the lines of:

class MailingProperty(SQLModel, table=True):
    person_id: UUID4 = Field(foreign_key="person.person_id")
    property_id: UUID4 = Field(foreign_key="property.property_id")

    person: Person = Relationship(back_populates="mailing_property")
    property: Property = Relationship(back_populates="person")

Then you would just create the Person and the Property and MailingProperty(person=person, property=property) with a table inbetween it doesn't and up needing one of the IDs first. Still needs to be deferred, but can be a little cleaner.