The current Invenio-Deposit module has several design issues:
All records are stored indexed twice:
Database: The primary record and deposit record are both stored in the
records_metadata table.
Elasticsearch: Two indexes exists - 1) One for records and 2) One for
deposits. Almost all records are indexed in both.
Records and deposits are mixed in the same database table.
Two buckets are used. One for the record, one for the draft. This is due to
permissions, and ensuring that the preserved files are clearly separate from
the uploaded file.
Unpublished deposits does not expire and stay in the system.
Two persistent identifiers exists - recid and depid each pointing to their
own record.
Double JSONSchemas/Mappings: Because of slight differences in records and
deposits we need two JSONSchemas, and two ES mappings and two marshmallow
schemas.
The Programmatic API is very easily polluted and becomes very hard to
maintain and extend with custom use cases
New design principles:
Clear "physical" separation between records and deposits. Records and
deposits should not be mixed in the same database table. Recovery of database
tables are significantly easier if records and deposits are not mixed.
Work with a single JSONSchema and single ES mapping.
A single persistent identifier.
Deposits are drafts and disappear from the system after being published.
Support file upload via third-party storage system like S3.
Codimd link for RFC
The current Invenio-Deposit module has several design issues:
records_metadata
table.New design principles: