maxtepkeev / architect

A set of tools which enhances ORMs written in Python with more features
Other
391 stars 57 forks source link

Good fit ? #54

Closed dekhaus closed 3 years ago

dekhaus commented 5 years ago

Hi

I have a simple PostgreSQL database / data model ...

Backup -> Snapshot
Snapshot -> Entry

The data that is being backed up are file system records and a record in the Entry table can correspond to either a directory or a file.

The reason why I'm considering partitioning my data is that a Backup of a directory can easily have 10 million Entry records in a given Snapshot.

I'm using Django and my Entry model looks like this (only showing relevant details) ...

class Entry(models.Model):
    snapshot = models.ForeignKey(Snapshot, verbose_name='Snapshot', on_delete=models.CASCADE)
    path = models.TextField(blank=False)
    ...

I want to partition the entries table on snapshot_id.

That makes the most sense to me - but I'm not sure what value to use for the 'constraint' in the architect.install decorator ...

@architect.install('partition', type='range', subtype='integer', constraint=???, column='snapshot_id')

To be 100% clear - I want all the entries associated with a given snapshot_id to be in the same partition.

I've organized my models into separate files within a models directory (within the arc/models directory). My models directory is a module.

I call the 'partition' command as follows ...

architect partition --module arc.models.entry

this command executes but generates the following error ...

architect partition: error: unsupported partition constraint "integer" in "Entry" model, supported partition constraints for "postgresql" database are: positive integer

what changes do I need to make to the decorator so that the partition command works ?

Also - presumably its possible to run the 'partition' command as soon as I create a Snapshot - but before I run my script to identify and insert Entry records - correct ?

Thanks Dave

maxtepkeev commented 5 years ago

Hi Dave,

Have you tried setting 1 as a constraint, i.e.

@architect.install('partition', type='range', subtype='integer', constraint='1', column='snapshot_id')

That should create a partition for every snapshot_id that you're gonna have. The only problem is that you're using foreign key and PostgreSQL is known to have problems with partitioning and foreign keys, I mean the partitioning itself should work, but the link that a foreign key gives you will most probably not work, so if possible, get rid of the foreign key.

Let me know if it worked for you and if I can be of any more help.

dekhaus commented 5 years ago

Hi

I will experiment later today.

Is it possible to create a partition directly from my Django code ? Specifically every time I create a Snapshot - I'll need to create a corresponding Entry partition - which I can then insert records into.

Thanks Dave

dekhaus commented 5 years ago

Also - regarding your answer ...

@architect.install('partition', type='range', subtype='integer', constraint='1', column='snapshot_id')

what's the significance of the constraint='1' ?

Is this documented ?

Thanks Dave

dekhaus commented 5 years ago

Hi Again

Regarding partitioning on snapshot_id and the problems it may cause - couldn't I get around that problem by just creating another field called partition_id and setting its value to snapshot_id.

I can partition on partition_id and still use the snapshot_id foreign key

How's that sound ?

Thanks Dave

maxtepkeev commented 5 years ago

Is it possible to create a partition directly from my Django code ? Specifically every time I create a Snapshot - I'll need to create a corresponding Entry partition - which I can then insert records into.

They will be created automatically if partition doesn't exist.

what's the significance of the constraint='1' ? Is this documented ?

Yes, check this section.

Regarding partitioning on snapshot_id and the problems it may cause - couldn't I get around that problem by just creating another field called partition_id and setting its value to snapshot_id.

I can partition on partition_id and still use the snapshot_id foreign key

It won't help, the problem isn't that you're trying to partition on a column that has a foreign key, but that a table has a foreign key in general. Again, the partitioning itself should work, it's a foreign key's constraint that will not work.

dekhaus commented 5 years ago

Hi

Is it possible to create a partition directly from my Django code ? Specifically every time I create a Snapshot - I'll need to create a corresponding Entry partition - which I can then insert records into.

Thanks Dave

maxtepkeev commented 5 years ago

As I previously wrote, you don't need to do this as they will be created automatically. After you've run the partition command from the console, several functions and triggers will be created in your database. It will be their responsibility to create new partitions when they don't exist. So you don't need to do that manually.

maxtepkeev commented 3 years ago

I believe this can be closed, if not, feel free to ask more questions.