standardebooks / web

The source code for the Standard Ebooks website.
https://standardebooks.org
Creative Commons Zero v1.0 Universal
234 stars 61 forks source link

Get ebooks by author from the DB #378

Closed colagrosso closed 2 months ago

colagrosso commented 2 months ago

This PR is longer because I changed my approach: In the Ebook class, I converted AuthorsUrl from an on-demand property to a field written to the database with an index.

Most of the PR is that change:

My original plan was to use the UrlName index on Contributors and join that with the Ebooks table. That works fine for single authors, but it was hard to make it work for ebooks with multiple authors like:

https://standardebooks.org/ebooks/william-wordsworth_samuel-taylor-coleridge/lyrical-ballads https://standardebooks.org/ebooks/karl-marx_friedrich-engels/the-communist-manifesto/samuel-moore

because the Contributors table has entries for them individually, not together:

MariaDB [se]> select ContributorId, EbookId, Name, UrlName from Contributors where UrlName in ('william-wordsworth', 'samuel-taylor-coleridge', 'karl-marx', 'friedrich-enge
ls');
+---------------+---------+-------------------------+-------------------------+
| ContributorId | EbookId | Name                    | UrlName                 |
+---------------+---------+-------------------------+-------------------------+
|             2 |       2 | William Wordsworth      | william-wordsworth      |
|             3 |       2 | Samuel Taylor Coleridge | samuel-taylor-coleridge |
|             4 |       3 | Karl Marx               | karl-marx               |
|             5 |       3 | Friedrich Engels        | friedrich-engels        |
+---------------+---------+-------------------------+-------------------------+

If you'd like to see that alternative, I created draft PR #377. I don't recommend that approach. The implementation in Library::GetEbooksByAuthor() would be harder to maintain. I'm showing that alternative because it was my initial intent, and I thought you might be interested.

acabal commented 2 months ago

The issue is that in this approach, we've denormalized the database - now we have the same author information stored in two places. This makes it easier to write joins, but data integrity is eventually going to become a problem - at some point someone, maybe not even you or me but someone years from now, is going to edit the DB directly to update an author and forget to change one or the other table, or mistype into one or the other table, and now we have a big problem that will be difficult to diagnose.

Also, we lose the ability to search by just one of the authors, or to search for the authors in an order that's different from our slug.

The approach in the other PR is just fine, and should be just as performant if we index both columns, especially since the case of multiple authors is pretty uncommon.

colagrosso commented 2 months ago

Fair enough, thanks for the feedback. I'll take your advice in #377 and polish that PR up.