Improve CSV importers - Githubissues

mauromsl commented 5 years ago

Current csv importers could use some of the following improvements:

Schema validation: rather than directly unpacking the csv lines we could have some sort of schema to compare the structure against, and return meaningful errors in case of schema validation.
Allow users to download a template with the correct headers in .xls or .csv formats
Rewrite the html form with a Django forms.Form for the file upload and add a field to specify csv delimiter as well as header separator.
Abstract the validation an import so it works for all kinds of importers. Then each importer only has to provide a schema and the logic to process the csv

mauromsl commented 4 months ago

We have discussed a v1 implementation for this issue that would involve:

Moving the validation logic used in Janeway core to callables/functions that can be invoked by importers. This will allow the same logic to be re-used and avoid traps where logic changes across import routines
Update the CSV importer to invoke those functions programatically on a per field basis
Wrap article processing in an atomic transaction to ensure problems ingesting a row don't leak to the rest of the import procedure.
When errors occur, return a generated CSV with just the faulty rows that have been rolled back.

mauromsl commented 4 months ago

Note: Mauro to raise an issue in core and make this depend on it

openlibhums / imports