ryangjchandler / orbit

A flat-file database driver for Eloquent. 🗄
MIT License
867 stars 39 forks source link

Refactor: Seed using chunked upserts #134

Closed johncarter- closed 1 year ago

johncarter- commented 2 years ago

First of all, I got a bit carried away with this PR and I fully appreciate that it might not be the direction that you want to take the package, so if you don't want to merge it, I'm happy to maintain my own fork.

Disclaimer: This is by far the biggest PR I have ever written so I apologise for any best practices / etiquette I haven't adhered to. I know it would have been better as smaller PRs but it felt like they would be too interdependent.

Key changes

Performance

The main impetus for the original PR was to fix the speed of rebuilding the database in v2 compared to v1. For reference, a full clear and rebuild of the DB for my test site ( with 20 models and ~17,000 files ) in v2 took ~140s locally, after this PR it takes ~4s.

The performance degradation was due to switching from a chunked upsert statement in v1, to updateOrCreate() for models that needed seeding. I reverted the update/create method back to upsert for the obvious performance benefits.

Eloquent events

I understand using upsert stops Laravel model events being dispatched when manually editing files. Model events are still dispatched when using standard Model::create(), update() and delete() methods.

When manually editing files I have added a new OrbitSeeded event, which is dispatched on every model that was seeded. This will allow users to optionally listen to that event which is the manual editing version of both saved and updated Eloquent events. Possibly fixes #136

Removal of Meta

I am not sure how important the previous Meta was, but I have removed it in favour of dynamically adding the orbit_file_path column to the DB schema. The reason I removed the model and table, was that it added overhead to the performance to have to upsert and retrieve two tables, essentially duplicating queries that could have been done in one.

I understand that adding the column to the model table could be considered to be polluting the user's data, but IMO the files are the one source of truth and so what happens in the database is up to this package.

Handling manually deleting files

Previously, unless I missed it, when you deleted a file from the source directory, the DB would not be aware of this, so the manually deleted record would still be in the database.

I have added a method that handles deleted files, handleDeletedFiles() which diffs the full directory of files compared to the orbit_file_paths that are stored in the DB. If there are any files in the DB not in the files then the DB record is deleted.

This is an expensive operation which would happen on every model boot. I'm sure it would be fine on ~50 files, but on my site it added ~400ms to the request. As a result I added a config option orbit.manual_mode which will disable checking for manually removed files. This will almost definitely want to be false in production, especially on bigger sites.

Note: Model::delete() will always delete the file, regardless of orbit.manual_mode.

Many to many table migration

Fixes #116.

When the Orbital migrate() is run in model boot, we now discover any many-to-many relationships on that model that use a ->withPivot() and new up the pivot model to perform any migrations / seeding.

Tests

I have added some tests for:

I also was experiencing inconsistent test runs when using the tearDown Model::all()->each->delete() so I replaced it on all tests with a manual deletion of the content directory using the File facade.

Todo