Reorganize with a focus on ability to serialize working state and update existing output file.

I'm arriving here for exactly the same reason as the OP in https://github.com/kemayo/leech/issues/63. But i'm also interested in this helping to eventually solve https://github.com/kemayo/leech/issues/55.

As you'll see, the changes are very invasive and perhaps somewhat opinionated. Most of the changes were in service of completing the feature, but at the same time I knew nothing about the epub format or any of this code going in, so certainly some of the changes were in service of improving my own understanding.

I'm posting it as a draft essentially to gauge your interest in merging before I continue putting effort into it. I've already sunk a decent amount of time into this, and if you (rightfully!) decided you weren't interested in general, I'd probably trim out all the features I dont plan to use and start maintaining my own fork 😬 (for myself). On the other hand, if you were, but the size of changes were the problem, there might be a way to prep some stuff ahead of time in a series of smaller PRs.

I dont think the state of the code is mergeable as-is. This is basically immediately after getting to the point of being confident it can serve the purpose of performing partial updates. In particular, i think all non-"arbitrary" sites are probably broken.

If you decided you could imagine merging this, I would very much like to set up some basic happy-path testing with pytest. If you happened to have example full-html content for each site-kind that'd make it much easier to start getting more confidence that this doesnt break all the dedicated sites.

Most importantly:

All the classes important to collecting data (Site, Story, Chapter, Cover) are dataclasses with from_json classmethods on them.
- Additionally i moved some of the runtime information used in Arbitrary into the class state
- In combination this was what allowed me to serialize all of the runtime state to disk, so that collection could be resumed later on from the same point.
epub.py/Epub use xmltodict, which let me declaratively produce the xml files.
- I think this makes it much easier to understand what the end-result of the files looks like.
- Also, i figured this would help in the future to perhaps parse content from an existing epub and merge it with the epub being produced.
- Even if you dont accept this changeset, i think it would be an improvment and might be useful for something like https://github.com/kemayo/leech/issues/57 and https://github.com/kemayo/leech/issues/55

Some things i have not yet done:

like i said above, i'd like to see if i can't get it to merge content with some source file more. For example, to avoid overwriting calibre metadata on a previously generated epub. If merged, i'd probably do this as a separate PR.

A not-necessarily ordered collection of comments about some details of the PR.

I converted all your instances of attrs to dataclasses. Not strictly necessary, per se, but dataclasses is already available in 3.7, and you seemed to largely not be using much attrs-specific behaviors.
Moved most the code I touched into a _leech package. This could have been leech, but i didn't necessarily want to move your leech.py, given how the tool is invoked today.
- I left sites, mostly because i only touched most of sites incidentally. I'd certainly be happy to move this too
- Somewhat unfortunately, my editor automatically runs black on save, so i blackened any code i touched. I'd rather perhaps blacken the whole repo in a PR ahead of this to reduce the diff a bit, than undoing it, if you didn't happen to mind 😬.
Section became Story, mostly because it seems like many of the variables pointing to Section instances were named story
All the references to templates and filenames and whatnot that seemed likely to be specific to epubs, moved into epub.py. This helped encapsulate a bunch of more dynamic runtime stuff into static decisions. like cover-specific handling in some of the output files.

kemayo / leech

Reorganize with a focus on ability to serialize working state and update existing output file. #90