Closed DanCardin closed 8 months ago
I just added a commit which added a basic collection test for the arbitrary site kind. Not full coverage of changed behavior there, but it exemplifies what I could do, given basic truncated examples of other sites.
I'll definitely also put together some basic happy-path tests to verify the epub.py changes, because that's completely different. and/or at least 1 test at the level consumed by leech.py
, as a sort-of e2e test.
Should be straightforward to jumpstart a basic github actions CI for said tests. Although again, this could happen in a follow up, if you were interested.
I'm arriving here for exactly the same reason as the OP in https://github.com/kemayo/leech/issues/63. But i'm also interested in this helping to eventually solve https://github.com/kemayo/leech/issues/55.
As you'll see, the changes are very invasive and perhaps somewhat opinionated. Most of the changes were in service of completing the feature, but at the same time I knew nothing about the epub format or any of this code going in, so certainly some of the changes were in service of improving my own understanding.
I'm posting it as a draft essentially to gauge your interest in merging before I continue putting effort into it. I've already sunk a decent amount of time into this, and if you (rightfully!) decided you weren't interested in general, I'd probably trim out all the features I dont plan to use and start maintaining my own fork 😬 (for myself). On the other hand, if you were, but the size of changes were the problem, there might be a way to prep some stuff ahead of time in a series of smaller PRs.
I dont think the state of the code is mergeable as-is. This is basically immediately after getting to the point of being confident it can serve the purpose of performing partial updates. In particular, i think all non-"arbitrary" sites are probably broken.
If you decided you could imagine merging this, I would very much like to set up some basic happy-path testing with pytest. If you happened to have example full-html content for each site-kind that'd make it much easier to start getting more confidence that this doesnt break all the dedicated sites.
Most importantly:
Site
,Story
,Chapter
,Cover
) are dataclasses withfrom_json
classmethods on them.Arbitrary
into the class stateepub.py
/Epub
usexmltodict
, which let me declaratively produce the xml files.Some things i have not yet done:
A not-necessarily ordered collection of comments about some details of the PR.
attrs
todataclasses
. Not strictly necessary, per se, but dataclasses is already available in 3.7, and you seemed to largely not be using much attrs-specific behaviors._leech
package. This could have beenleech
, but i didn't necessarily want to move yourleech.py
, given how the tool is invoked today.sites
, mostly because i only touched most of sites incidentally. I'd certainly be happy to move this tooblack
on save, so i blackened any code i touched. I'd rather perhaps blacken the whole repo in a PR ahead of this to reduce the diff a bit, than undoing it, if you didn't happen to mind 😬.Section
becameStory
, mostly because it seems like many of the variables pointing toSection
instances were named storyepub.py
. This helped encapsulate a bunch of more dynamic runtime stuff into static decisions. like cover-specific handling in some of the output files.