Closed asmecher closed 1 year ago
Challenge: In order to "perfectly" document the database, we need to consider both installations and upgrades. These follow different execution paths, potentially leading to duplication (and the potential for the two to fall out of sync). It is also difficult to add comments to an existing schema for SQL reasons.
It's necessary to either duplicate the comments on the upgrade migrations that introduce columns, or leave upgraded databases undocumented. If we choose to duplicate documentation into both installation and upgrade migrations, it's 100% likely that they will diverge over time. Columns that are not adjusted by migrations will remain undocumented unless we intentionally introduce restatements of the column types etc., which is too risky for the benefit IMO.
commit 631caef1569cbea439668fe55cf1089f060f5b37
Author: Alec Smecher <alec@smecher.bc.ca>
Date: Tue Nov 15 14:11:14 2022 -0800
Add database comments for submissions table
diff --git a/classes/migration/install/SubmissionsMigration.php b/classes/migration/install/SubmissionsMigration.php index 62105e0503..f1a43ae42b 100644 --- a/classes/migration/install/SubmissionsMigration.php +++ b/classes/migration/install/SubmissionsMigration.php @@ -28,27 +28,43 @@ class SubmissionsMigration extends \PKP\migration\Migration { // Submissions Schema::create('submissions', function (Blueprint $table) {
$table->comment('Records data about submissions (articles in OJS, monographs in OMP, or preprints in OPS). Each submission may have multiple publications representing revisions of the submission. See current_publication_id for the "current" publication.');
$table->bigInteger('submission_id')->autoIncrement();
$table->bigInteger('context_id');
$table->bigInteger('context_id')
->comment('The journal_id (see journals) in OJS, press_id (see presses) in OMP, or server_id (see servers) in OPS');
$contextDao = \APP\core\Application::getContextDAO();
$table->foreign('context_id', 'submissions_context_id')->references($contextDao->primaryKeyColumn)->on($contextDao->tableName)->onDelete('cascade');
$table->index(['context_id'], 'submissions_context_id');
// NOTE: The foreign key relationship on publications is declared where that table is created.
$table->bigInteger('current_publication_id')->nullable();
$table->bigInteger('current_publication_id')->nullable()
->comment('The current "published" publication ID; the default publication that will be displayed to readers in the front end. Newer publications may exist but they will not yet be published.');
;
$table->datetime('date_last_activity')->nullable()
->comment('The time a substantive (e.g. editorial) activity was performed on the submission.');
$table->datetime('date_submitted')->nullable()
->comment('The date the submission was submitted to the journal. If null, the submission has not yet been submitted (e.g. it is incomplete).');
$table->datetime('date_last_activity')->nullable();
$table->datetime('date_submitted')->nullable(); $table->datetime('last_modified')->nullable();
$table->bigInteger('stage_id')->default(WORKFLOW_STAGE_ID_SUBMISSION);
$table->bigInteger('stage_id')->default(WORKFLOW_STAGE_ID_SUBMISSION)
->comment('The current workflow stage ID of the submission; must be a value corresponding to a WORKFLOW_STAGEID... constant, defined in PHP. See https://github.com/pkp/pkp-lib/blob/main/classes/core/PKPApplication.php for values.');
$table->string('locale', 14)->nullable();
$table->smallInteger('status')->default(PKPSubmission::STATUS_QUEUED);
$table->smallInteger('status')->default(PKPSubmission::STATUS_QUEUED)
->comment('The current workflow status of the submission; must be a value corresponding to a STATUS_... constant, defined in PHP. See https://github.com/pkp/pkp-lib/blob/main/classes/submission/PKPSubmission.php for values.');
$table->smallInteger('submission_progress')->default(1)
->comment('The current numeric step of the author\'s submission process, for incomplete submissions; 0 for complete submissions.');
$table->smallInteger('submission_progress')->default(1);
// Used in OMP only; should not be null there
$table->smallInteger('work_type')->default(0)->nullable();
$table->smallInteger('work_type')->default(0)->nullable()
->comment('Used in OMP only; should not be null there'); }); Schema::table('stage_assignments', function (Blueprint $table) { $table->foreign('submission_id')->references('submission_id')->on('submissions')->onDelete('cascade');
# Approach 2: Apply self-documentation in its own migration after installation/upgrade (BLOCKED)
## Advantages:
Documentation is specified once, and will be applied to both upgrades and migrations
Documentation can be re-applied to an existing schema by re-running the migration, as it doesn't make other changes (this will probably be handy for maintenance)
Schema documentation is not kept where the schema is created; it may be possible to change the schema and forget to change the documentation (though upgrade tests will catch anything egregiously mismatched, like a deleted column)
It diverges from our current practice of not using the same migration for both upgrade and installation
Technically not possible :P due to e.g. lack of Laravel support and underlying lack of MySQL support. Currently the column type etc. must be reiterated when adding a comment, making this unfeasible.
diff --git a/dbscripts/xml/install.xml b/dbscripts/xml/install.xml
index a3bccd62fa..71b1d8120f 100644
--- a/dbscripts/xml/install.xml
+++ b/dbscripts/xml/install.xml
@@ -47,6 +47,8 @@
<migration class="PKP\migration\install\CategoriesMigration" />
<migration class="APP\migration\install\MetricsMigration" />
diff --git a/dbscripts/xml/upgrade.xml b/dbscripts/xml/upgrade.xml
index e4a0fd39f3..d0a426d532 100644
--- a/dbscripts/xml/upgrade.xml
+++ b/dbscripts/xml/upgrade.xml
@@ -179,6 +179,7 @@
<migration class="APP\migration\upgrade\v3_4_0\I7796_UpdateCrossrefSchema"/>
<migration class="PKP\migration\upgrade\v3_4_0\I7287_RemoveEmailTemplatesDefault"/>
<data file="dbscripts/xml/upgrade/3.4.0_preupdate_email_templates.xml" />
<code function="installEmailTemplate" key="EDITOR_DECISION_NOTIFY_OTHER_AUTHORS" locales="en_US" />
<code function="installEmailTemplate" key="EDITOR_DECISION_NEW_ROUND" locales="en_US" />
diff --git a/lib/pkp/classes/migration/DocumentationMigration.php b/lib/pkp/classes/migration/DocumentationMigration.php new file mode 100755 index 0000000000..e8d7c40d2d --- /dev/null +++ b/classes/migration/DocumentationMigration.php @@ -0,0 +1,42 @@ +<?php
+/**
*
*
*/
+namespace PKP\migration;
+use Illuminate\Database\Schema\Blueprint; +use Illuminate\Support\Facades\DB; +use Illuminate\Support\Facades\Schema;
+class DocumentationMigration extends \PKP\migration\Migration +{
/**
*/
public function up(): void
{
Schema::table('announcement_types', function (Blueprint $table) {
$table->comment('Records data about submissions (articles in OJS, monographs in OMP, or preprints in OPS). Each submission may have multiple publications representing revisions of the submission. See current_publication_id for the "current" publication.');
// BLOCKED: the ->column(...) function does not exist! Currently the column type etc. must be respecified fully.
$table->column('context_id')->comment('The journal_id (see journals) in OJS, press_id (see presses) in OMP, or server_id (see servers) in OPS FLARM');
});
}
/**
*/
public function down(): void
{
// NOOP
} +}
I think there's an argument to be made that the most valuable part of this will be the public documentation we can publish with SchemaSpy. For that, we don't need comments on migrated databases because we can generate the database documentation from a fresh install.
I prefer to have the comment near its definition, so I'd vote for the "Approach 1" :)
About the synchronization issue/migrations, as we didn't have such documentation before, I think it's ok to follow what @NateWr said (I mean, few people will even notice the comments are missing).
It would be nice to have comments even on migrations (then developers would be able to use their own tools to view/generate docs)... But there are some comments out there stating the table might be recreated on older MySQL versions (or if the column definition isn't exactly the same) 👀
I was just passing by to congratulate and welcome the initiative. ;-) Very happy that you have found the way and the energy to implement this. Thanks!
I agree with Nate's comment. The goal is to document the DB so that the schema is clear to developers (or not so developers. ie: those who want to exploit their magazine's data in another way), so there would be no need to worry about "migrated databases".
That being so, I understand that the idea is that this documentation is automatically generated during the packaging of the version, right?
Looks like we're all agreed: self-documentation will be added on schema creation, not upgrade. The downside is that users wanting to use e.g. schemaspy or metabase on their local installations won't be able to benefit from the self-documentation. The only work-around I can think of for this is to use mysqldump with --no-data
and --no-create-info
to generate dumps of a) a clean new schema and b) an existing dataset without schema, then combine the two.
@marcbria, your question:
this documentation is automatically generated during the packaging of the version, right?
Yes, and hosted on the PKP website, but not included in the .tar.gz package. Including it in the package would add hundreds (!) of MB of bloat.
Make more sense for weight, but also because centralized/web documentation could be updated.
I also though in same workarround. ;-)
The following PRs add comments to about half of the tables. @asmecher can you review this for me? I've generated some SchemaSpy documentation for this but it's easy to regenerate if you find any errors.
PRs: https://github.com/pkp/pkp-lib/pull/8756 https://github.com/pkp/ojs/pull/3819 https://github.com/pkp/omp/pull/1359 https://github.com/pkp/ops/pull/489
Thanks, @NateWr! Just one general comment about _settings
tables; once that's adjusted, these are ready for merge.
Merged the PRs above to main
. There are test failures in OMP, but they seem to predate these pull requests.
@asmecher would you like to close this issue or leave it open until all database tables and columns are described?
Please leave this open; I'll pop some additional descriptions in.
Basic descriptions added for all tables. Table columns will still need to be documented, but I'd suggest doing that outside of this entry.
Describe the bug Database table and column descriptions would massively improve our schema documentation (e.g. using SchemaSpy). These should be added.