kemayo / leech

Turn a story on certain websites into an ebook for convenient reading
MIT License
154 stars 24 forks source link

Stable seed generation for Sections #60

Closed ClaasJG closed 3 years ago

ClaasJG commented 3 years ago

Good evening,

currently when generating an epub a Section uuid will be randomly generated. (See: sites/init.py#L18 ). The uuid will be used as folder name to store content (See: ebook/init.py#L93 ).

When I use a Toline Shine 3 to read an epub and then redownload and update the epub on the device, the Tolino 'forgets' my currently viewed page. The reason for this is that the viewed html file does not exist anymore because the uuid changed. (Scenario: erraticerrata currently writes A Practical Guide to Evil Do Wrong Right 7 and I update my epub version every time new chapters are released )

This pull request uses the Sections/Chapter title to seed the uuid generation. Therefor the same uuid will be generated with every download and the Tolino is able to open the last viewed page after updating the epub.

I don't know if the title is the best seed for the uuid but I think its probably the most stable identification for a book wich is still updated.

(https://github.com/JimmXinu/FanFicFare generates chapters directly in the OEBPS folder i.e. I dont know the epub specs but I dont know if there even is a reason to store the files in an randomly named folder?)

Have a nice day -ClaasJG

kemayo commented 3 years ago

For stable identifiers, the URL is probably less likely to change. The site-definitions generally try to work out a "canonical" URL to use, rather than whatever random chapter you pass in, so it's fairly consistent. Difficulty is that currently the individual chapters don't store a URL, but I don't mind changing that -- bigger change than you've currently got here, though, since all the site definitions would need to be touched.

The subdirectories come from a conflux of factors:

  1. I want to support footnotes, which require a link to the file containing the footnotes... so being able to link to ./footnotes.html is convenient
  2. I let you cram multiple "works" into a book, mostly because of AO3 series.
  3. So, each "work" having its own directory means there's no footnote mixups.
ClaasJG commented 3 years ago

I will try to update this PR in a way that each Chapter and Section knows its URL and uses the URL to generate an stable uuid. Thanks for the explanation of why subdirectories are used.

kemayo commented 3 years ago

If you don't want to make wide-reaching changes, I could add the URL to Chapters and you could rebase onto that?

ClaasJG commented 3 years ago

I just went through the Code of ao3.py and from what I saw each 'Section' which is a substory already has an unique url. But I never saw use for the chapters id, neither in ao3.py nor in other places (i.e. epub creation). To solve my the initial problem its enough when Sections have a stable uuid and the Chapter id is irrelevant. I removed the id for Chapters and testet, that at least epubs using ao3.py and arbitrary as handlers are still working. Therefor Ive updated this PR. It now removes the Chapter id and uses the url of Sections to generate Section ids. If an id is still needed in Chapters it could either be unstable, stable based on the content or I would take you up on your offer to add URLs to Chapters and rebase. But currently I think this PR can focus on Sections to solve the initial problem.

kemayo commented 3 years ago

Yeah, I'm happy to merge this as-is.

There's various places that manually work out a chapterid which I should probably switch to being Chapter.id, but making that work would involve refactoring the flow of chapter-creation in a few places. So I'll let that sit a bit.