kemayo / leech

Turn a story on certain websites into an ebook for convenient reading
MIT License
154 stars 24 forks source link

Reorganize with a focus on ability to serialize working state and update existing output file. #90

Closed DanCardin closed 6 months ago

DanCardin commented 1 year ago

I'm arriving here for exactly the same reason as the OP in https://github.com/kemayo/leech/issues/63. But i'm also interested in this helping to eventually solve https://github.com/kemayo/leech/issues/55.

As you'll see, the changes are very invasive and perhaps somewhat opinionated. Most of the changes were in service of completing the feature, but at the same time I knew nothing about the epub format or any of this code going in, so certainly some of the changes were in service of improving my own understanding.

I'm posting it as a draft essentially to gauge your interest in merging before I continue putting effort into it. I've already sunk a decent amount of time into this, and if you (rightfully!) decided you weren't interested in general, I'd probably trim out all the features I dont plan to use and start maintaining my own fork 😬 (for myself). On the other hand, if you were, but the size of changes were the problem, there might be a way to prep some stuff ahead of time in a series of smaller PRs.

I dont think the state of the code is mergeable as-is. This is basically immediately after getting to the point of being confident it can serve the purpose of performing partial updates. In particular, i think all non-"arbitrary" sites are probably broken.

If you decided you could imagine merging this, I would very much like to set up some basic happy-path testing with pytest. If you happened to have example full-html content for each site-kind that'd make it much easier to start getting more confidence that this doesnt break all the dedicated sites.

Most importantly:

Some things i have not yet done:

A not-necessarily ordered collection of comments about some details of the PR.

DanCardin commented 1 year ago

I just added a commit which added a basic collection test for the arbitrary site kind. Not full coverage of changed behavior there, but it exemplifies what I could do, given basic truncated examples of other sites.

I'll definitely also put together some basic happy-path tests to verify the epub.py changes, because that's completely different. and/or at least 1 test at the level consumed by leech.py, as a sort-of e2e test.

Should be straightforward to jumpstart a basic github actions CI for said tests. Although again, this could happen in a follow up, if you were interested.