standardebooks / web

The source code for the Standard Ebooks website.
https://standardebooks.org
Creative Commons Zero v1.0 Universal
235 stars 64 forks source link

DB read path: Convert a few more straightforward cases to the DB #374

Closed colagrosso closed 3 months ago

colagrosso commented 4 months ago
colagrosso commented 4 months ago

Thanks for the review!

I addressed the feedback about Collections, so I'll reply to the comment thread above about that.

I didn't update the Tags table yet.

acabal commented 4 months ago

OK, so thinking about this a little more. Maybe we need a CollectionMembership class, which would be something like:

class CollectionMembership{
    public Collection $Collection;
    public ?int $SequenceNumber;
}

This would basically map to the CollectionEbooks table, and would be unusual in the sense that it's an object that is synthesized from the DB but doesn't have its own table per se.

Then Ebooks would have CollectionMemberships, not Collections:

class Ebook{
    ....
    public array<CollectionMembership> $CollectionMemberships
}

print($ebook->CollectionMemberships[0]->SequenceNumber);

Would that make things easier? It wouldn't strictly align with the database representation but it would avoid some of the gymnastics we'd have to go through otherwise.

If you don't think so, then I would suggest using a different term than membership in this PR. We don't use that term anywhere else, so we want to be consistent. Collections have Ebooks so we don't "add a membership", we "add an ebook to a collection". In other words,


class Ebook{
    function Create(){
        ....
        $this->InsertCollectionMemberships(); // Inconsistent with current code style, this inverts the relationship; the ebook is adding itself to a collection using a term that doesn't exist elsewhere

        // More consistent: a collection contains ebooks, so we add ebooks TO the collection instead of having an ebook ADD ITSELF to the collection. 
        $collection = Collection::GetByUrlNameOrCreate($collectionUrlName);
        $collection->AddEbook($this); // Or $this->EbookId
       ....
    }
}
colagrosso commented 4 months ago

Thanks for thinking about this more and the two options. I like what you wrote about not adding a new term and being consistent with the code, so I went with the second suggestion. I removed the term membership and added Collection::AddEbook and Collection::RemoveEbook methods.

761e01c5fa8f98e48180478dc7201d55e9192e95 is the commit with the code to review.

We haven't talked too much about this passage from Ebook::FromFilesystem, but I could use your guidance there for naming and readability:

        // Get SE collections
        $collections = [];
        $collectionPositions = [];
        foreach($xml->xpath('/package/metadata/meta[@property="belongs-to-collection"]') ?: [] as $collection){
            $c = Collection::FromName($collection);
            $id = $collection->attributes()->id ?? '';

            foreach($xml->xpath('/package/metadata/meta[@refines="#' . $id . '"][@property="group-position"]') ?: [] as $s){
                $collectionPositions[$c->UrlName] = (int)$s;
            }
            foreach($xml->xpath('/package/metadata/meta[@refines="#' . $id . '"][@property="collection-type"]') ?: [] as $s){
                $c->Type = (string)$s;
            }
            $collections[] = $c;
        }
        $ebookFromFilesystem->Collections = $collections;
        $ebookFromFilesystem->CollectionPositions = $collectionPositions;

On master, we previously stashed the sequence number on the Collection object, but that's not an option anymore:

https://github.com/standardebooks/web/blob/486167dd431a3ce96b2e83a64e8f05777362616e/lib/Ebook.php#L279

Storing the collection positions in this array seems reasonable to me:

    /** @var array<string, int> $_CollectionPositions */
    protected $_CollectionPositions = null;

but I could make a new object out of it. It would be similar to the CollectionMembership object you proposed above, potentially with a different name. Having the array keys be the UrlName is convenient, but I could work around it.

Also, I want to be sensitive to your review time. If this PR has too many commits, I could:

  1. Stash the other commits about downloads, artworks, tags, etc., and use this commit to perfect Collections
  2. Drop the Collections work from this PR, commit the more straightforward commits, then try again fresh
acabal commented 3 months ago

Right, that's why I proposed CollectionMembership, because the sequence number has to go somewhere, but we don't have a logical place for it the way it currently stands. I suppose you could store it as in an array as a property of Ebook like you suggest, but then we've decoupled collections with their positions and we have to match the two by searching the array every time we want the sequence number. That's not great encapsulation and will invite errors later down the line, say if in a few years we have to revisit this code to in order to put sequence numbers somewhere else.

In any case, go ahead with how you think is best, we can always just see how things pan out in real life and if it becomes annoying we can always change it.

colagrosso commented 3 months ago

Thanks for the extra feedback about encapsulation and future maintenance. I also appreciate the attitude that we can always change it later.

I force pushed two commits:

ec54e3ecd3e24cf548b8e199061cfe0947232ce6: Adds the CollectionMembership class 9e6b090b863c61865296227b14c2fb355dc6b923: Addresses the Tag comment above about adding a Type enum

Onward to some more interesting DB read path cases!