pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
306 stars 445 forks source link

Add descriptive comments to database tables/columns #8437

Closed asmecher closed 1 year ago

asmecher commented 1 year ago

Describe the bug Database table and column descriptions would massively improve our schema documentation (e.g. using SchemaSpy). These should be added.

asmecher commented 1 year ago

Challenge: In order to "perfectly" document the database, we need to consider both installations and upgrades. These follow different execution paths, potentially leading to duplication (and the potential for the two to fall out of sync). It is also difficult to add comments to an existing schema for SQL reasons.

Approach 1: Add the comments directly to the schema creation migrations.

Advantages:

diff --git a/classes/migration/install/SubmissionsMigration.php b/classes/migration/install/SubmissionsMigration.php index 62105e0503..f1a43ae42b 100644 --- a/classes/migration/install/SubmissionsMigration.php +++ b/classes/migration/install/SubmissionsMigration.php @@ -28,27 +28,43 @@ class SubmissionsMigration extends \PKP\migration\Migration { // Submissions Schema::create('submissions', function (Blueprint $table) {

NateWr commented 1 year ago

I think there's an argument to be made that the most valuable part of this will be the public documentation we can publish with SchemaSpy. For that, we don't need comments on migrated databases because we can generate the database documentation from a fresh install.

jonasraoni commented 1 year ago

I prefer to have the comment near its definition, so I'd vote for the "Approach 1" :)

About the synchronization issue/migrations, as we didn't have such documentation before, I think it's ok to follow what @NateWr said (I mean, few people will even notice the comments are missing).

It would be nice to have comments even on migrations (then developers would be able to use their own tools to view/generate docs)... But there are some comments out there stating the table might be recreated on older MySQL versions (or if the column definition isn't exactly the same) 👀

marcbria commented 1 year ago

I was just passing by to congratulate and welcome the initiative. ;-) Very happy that you have found the way and the energy to implement this. Thanks!

I agree with Nate's comment. The goal is to document the DB so that the schema is clear to developers (or not so developers. ie: those who want to exploit their magazine's data in another way), so there would be no need to worry about "migrated databases".

That being so, I understand that the idea is that this documentation is automatically generated during the packaging of the version, right?

asmecher commented 1 year ago

Looks like we're all agreed: self-documentation will be added on schema creation, not upgrade. The downside is that users wanting to use e.g. schemaspy or metabase on their local installations won't be able to benefit from the self-documentation. The only work-around I can think of for this is to use mysqldump with --no-data and --no-create-info to generate dumps of a) a clean new schema and b) an existing dataset without schema, then combine the two.

@marcbria, your question:

this documentation is automatically generated during the packaging of the version, right?

Yes, and hosted on the PKP website, but not included in the .tar.gz package. Including it in the package would add hundreds (!) of MB of bloat.

marcbria commented 1 year ago

Make more sense for weight, but also because centralized/web documentation could be updated.

I also though in same workarround. ;-)

NateWr commented 1 year ago

The following PRs add comments to about half of the tables. @asmecher can you review this for me? I've generated some SchemaSpy documentation for this but it's easy to regenerate if you find any errors.

PRs: https://github.com/pkp/pkp-lib/pull/8756 https://github.com/pkp/ojs/pull/3819 https://github.com/pkp/omp/pull/1359 https://github.com/pkp/ops/pull/489

asmecher commented 1 year ago

Thanks, @NateWr! Just one general comment about _settings tables; once that's adjusted, these are ready for merge.

NateWr commented 1 year ago

Merged the PRs above to main. There are test failures in OMP, but they seem to predate these pull requests.

@asmecher would you like to close this issue or leave it open until all database tables and columns are described?

asmecher commented 1 year ago

Please leave this open; I'll pop some additional descriptions in.

asmecher commented 1 year ago

Basic descriptions added for all tables. Table columns will still need to be documented, but I'd suggest doing that outside of this entry.