le-phare / import-bundle

Symfony bundle to import CSV files into a database
MIT License
9 stars 0 forks source link

Add an option to check unicity before copy #2

Open thislg opened 7 months ago

thislg commented 7 months ago

A lot of the time we have to add custom code to validate loaded data before copying it. A common use case is to ignore duplicate lines but still continue the import.

It could be set like this:

resources:
    my_resource_name:
        load:
            extra_fields:
                valid:
                    type: boolean
                    options:
                        default: true
        post_load:
            validate:
                -
                    columns:
                        - code
                    constraint_type: unique
                    label: 'Unique code'
                    on_invalid: ignore # abort|ignore
        copy:
            strategy_options:
                copy_condition: valid IS TRUE

A subscriber on ImportEvents::POST_LOAD would then execute an UPDATE on temporary table to set the "valid" field to false on failing rows. In case of validation error, when on_invalid is set to "ignore", it would add logs "Unique code validation constraint failed. Skipping duplicate my_resource_name (code: 12345) at lines 4, 5, 6" and import would continue without copying invalid lines. If on_invalid is set to "abort", it would stop the import without copying the data.

Other validation constraints could be added, like format validation (regex), etc. A simpler option would be to skip the validation config, instead adding an option to run an arbitrary SQL query on post_load to set the "valid" flag.

pierreboissinot commented 7 months ago

@thislg

Suggestions:

Saami783 commented 1 week ago

Hello, is this piece of code from the documentation? Because there is no indication that we can produce a custom configuration especially with "post_load".

thislg commented 1 week ago

Hello, is this piece of code from the documentation? Because there is no indication that we can produce a custom configuration especially with "post_load".

load and copy options are documented (see https://github.com/le-phare/import-bundle/blob/master/docs/configure/load.md and https://github.com/le-phare/import-bundle/blob/master/docs/configure/copy.md). You can't add arbitrary options so the post_load option does not exist but in this issue I suggest adding it so we can add validation constraints.