kemayo / leech

Turn a story on certain websites into an ebook for convenient reading
MIT License
158 stars 24 forks source link

Option to only get metadata #81

Open KeinNiemand opened 2 years ago

KeinNiemand commented 2 years ago

An option to only get metadata like the number of chapters or the last updated date would be really usefull to make scripts that only redownload things that have updated

kemayo commented 2 years ago

It's possible, but only for some sites. Or, rather, only possible without a full download for some sites -- the ones that e.g. walk through a full story via next-chapter links would be problematic.

KeinNiemand commented 2 years ago

Even if it takes a full download/nearly a full download (no need to do generate an epup, do a lot of formatting etc.) and therfore still take almost as long as a full download getting just metadata would still be useful since it woild allow me to see what story have updated by their file modified date without having to look inside the files or look at the alerts I'm getting from the sites the storys are downloaded from.

KeinNiemand commented 2 years ago

Also even if for siteres where it has to go trough links it still has to go trough every links to get say the chapter count it would still be faster then a full download as it dosn't have to do any formatting/proccessing/converting to epub, ...

ClaasJG commented 2 years ago

I once considered to write a PR which would extend the Side class to allow the creation of a Side specific 'data bundle' which could be used by the same Site implementation again to check if there is a potential update.

Such a 'data bundle' could be the last update date if available or for an custom adapter which uses a next selector it could contain the link to the last page as well as the hash of the last page. This would allow to quickly check if there could be an update. (i.e. extract the last chapter again and check if the hash changed or check if a 'next' link became available).

It is an open question, if this 'data bundle' should be included in the epub or written to an extra file or to stdout. While I think it would be nicer to have the 'data bundle' embedded in the epub it would be easier to generate an extra file. Leech may not know the name of an epub until it got completely downloaded and therefor it could be hard to relate links / custom jsons and the generated epubs.

While you asked for metadata like the number of chapters I think the possibility to generate such a 'data bundle' paired with the possibility to ask leech if there may be an update would also solve your problem? If there is interest in such a solution I would try to implement it.