plateaukao / einkbro

A small, fast web browser based on Android WebView. It's tailored for E-Ink devices but also works great on normal android devices.
https://einkbro.github.io/overview.html
Other
1.25k stars 89 forks source link

Save as ePub: kudos / feature request #107

Closed dredmorbius closed 2 years ago

dredmorbius commented 2 years ago

Daniel: Save as ePub is seriously brilliant. I'd stumbled across it about a month ago, and it's changing how I use my tablet, browse, and treat the Web generally. Brief bit on that here:

https://toot.cat/@dredmorbius/107958709435468728

Use cases I've already found:

The fact that I can open the ePub either in Einkbro or another reader is also fantastic.

And (not but, but and ;-) ... if I could suggest some improvements:

Streamline adding multiple documents / pages at the same time. Presently that's a longish multi-select process, and it's necessaryto dismiss the "open ebook" dialogue repeatedly. Instead, providing a multi-select interface and allowing people to select the article(s) they'd like to add, then providing an edit with suggested title(s) for each chapter, would be great.

Default including the present date in the saved article. I'm a fan of ISO-8601 (YYYY-MM-DD). Defaulting to the first item title + publication + current date would probably be a good basis (that can be edited). "einkbro.epub" isn't especially distinctive.

If possible, include images / graphics. I'm aware these can mess with formatting within epubs, and often there's not much value added, but occasionally there is.

Include section / internal document structure, if possible. Example, this NY Times piece (https://www.nytimes.com/interactive/2019/04/03/magazine/rupert-murdoch-fox-news-trump.html) includes sections demarked as <h4> tags. Neither the section headers nor the titles are included within the ePub text.

I'd like to see some improvements to the saved epub organisation as well, though I'm not sure specifically what recommendations I'd make here. Probably along the lines of:

But really, the ability to save content in a format that won't change, doesn't page-break through lines of text (as Android's default print-to-PDF engine seems to do all the time --- on multiple browsers, FWIW), to organise Web content for my own needs, and to have it independent of any one browser is ... liberating.

Thanks!

plateaukao commented 2 years ago

Hi, @dredmorbius

Glad to hear that saving epub file feature is helpful for you. Here's some feedback to your suggestions:

  1. streamline adding multiple documents or pages: I haven't found how to do so easily with current implementation; and, in order to do so, EinkBro has to load different web urls one by one, to make sure the web content is already loaded before saving next article into the same epub file. I'll consider adding creation date into file name by default. However, as for adding title to filename ,it's a bit too much to do so since most web titles are long and not suitable to be treated as normal file name.

  2. include images / graphics: yes, EinkBro's saving epub feature does save images (for most websites). If it does not work for some web sites, that may be caused by how the website display their images, or they just block other download requests other than showing them on browser.

  3. All the original style formats of the web content are removed and replaced with the one provided by Firefox's Reader mode style. It's to make sure the epub article format can be more under control. I don't know how each website include their css style files (they may contain 1 file, or 10 files, or more, in the same html, or in other links, or other ways), and it would be difficult to keep track of different css style files in each article. This is out of my capability.

  4. Some improvements to the saved epub organization: I would suggest using a traditional file explorer APP or a full-featured eReader APP for this, instead of trying to all the tasks in EinkBro app. My concept is to quickly export web sites into epub files, and leave all other management or even epub reading task to other APPs that are made for those purposes. :)

dredmorbius commented 2 years ago

@plateaukao: Thanks for the responses.

  1. On multi-select adding --- this may ultimately require offloading to some third-party generator (and I'm planning to look into tools for web -> epub conversion as well), but if you'd put a sticky / tickler on this and see if something appears that looks like it might fit, I'd appreciate it. I'll do likewise.

  2. Images/graphics: Interesting. I'll play with this some more, and compare against other article-archival tools (I use Pocket, though I'm very much NotAFan: https://old.reddit.com/r/dredmorbius/comments/5x2sfx/pocket_it_gets_worse_the_more_you_use_it/)

  3. Style/structure preservation. I'll look at upstream. Checking Pocket for the NYTimes series, I'm finding that it includes both images and headings, FWIW. That article would be a good test case for development / CI.

  4. Document management: I've yet to find a tool that's well-suited to this task. On my Onyx BOOX, I rely on the Onyx-supplied Storage navigator, though that's pretty unsatisfactory. For deeper searches, I rely heavily on Termux and Linux/Unix tools to search by filename and/or contents (various PDF tools and related). Even on desktop space there's not much that seems suited, though Calibre and Zotero seem to be the best of a sorry set of offerings. I've talked to the Onyx people about this as well, they've got some interesting integrations such as having readings tracked in the Calendar app (though not easily traversable), and a few recent-readings features, though I'd prefer those go back further in time. I'm granting this is somewhat out-of-scope, but it's also a persistent pain point and would seem to me to be a huge boost in utility.

One thing I like tremendously about Einkbro is that it really puts the reader first, ahead of web authors and publishers, and doesn't hesitate to adjust experiences and presentation to facilitate reading.

Of the four issues raised, I'd read the images and headings as the most immediately tractable and would suggest looking at the NY Times article I'd linked above to see if there's a fault in either the Readability library or Einkbro's invocation of it. The other two items are really "I'd love to see this in future", but are ambitious.

Again, this is the first browser I've been excited about in over two decades.

plateaukao commented 2 years ago

@dredmorbius

1 . In fact, epub saving feature is only one of the many features of EinkBro that makes it easier to read the content on other Reader APP when I want to highlight some content and as you did, for gathering articles from different resource into the same epub file. So far, I don't intend to expand this feature too far. To list all possible links in a web article, hand pick some of them, and download them into the same epub file or saving them in any other format: it's more like a web crawler. I believe you can find these kind of tools as extensions for PC Chrome or PC Firefox. It would be a better choice to do it on PC.

However, if you still find some useful open source tools (that's working for mobile, not PC), welcome to let me know too, I will check the feasibility of integrate others's effort if it's not too big or too complicated.

2 . and 3. I don't use Pocket now. As far as I know, Pocket is a company; they tailored their crawling tools to fit some major websites to make sure their users can grab major website successfully. For EinkBro, it's only a leisure project that I do occasionally. So, I could not work on specific scenarios for some specific feature (unless that happen to be what I want too). Personally, I don't use NYTimes. and the link you provided, without login, the UI is awful. I can't even see the content at all. Maybe you can provide other websites that are more non-login user friendly.

image

4 . I understand that we all need a good document management app on EInk readers. I would love to see these kind of APPs written by others too. As for EinkBro, I will keep improving its features that are more related to browsing experience.

Thank you for writing lengthy comments and suggestions for EinkBro. I really appreciate it. If you could provide other websites that have the same problem as NYTimes (can't save images in epub file), I will look into them when I have time. :)

dredmorbius commented 2 years ago

@plateaukao:

You're right about einkbro having numerous e-ink friendly features. I've been enjoying those, and just found your blog post about the design philosophy --- page-based navigation, minimising repaints, etc. I'd arrived at a similar set of e-ink design principles myself. Einkbro was already hugely compelling just from those. I didn't discover or try the save-to-epub until I'd already been using the broweser for many months.

Again: the save-as-ePub feature just happens to be really good. As I said, transformational, and a clear differentiator from Firefox, Chrome, or Safari on either handheld or desktop systems.

I'm looking into alternative tools for ePub generation. Calibre and Pandoc are among the ones that seem to turn up most often, I'm familiar with both. (Neither are suited to Android, unfortunately --- Calibre doesn't have a full Android app. Pandoc leans heavily on LaTeX for document conversions and that's not been ported to Termux, it's a huge package.)

I'll see if I can find another example of a site that doesn't capture images when saved to ePub via Einkbro --- and is more non-subscriber friendly. (I'm not an NYT subscriber, disabling JS and using Einkbro in Incognito mode minimises most of the annoyances.)

Pocket was acquired by Mozilla a few years ago, and is heavily promoted within Firefox. It's ... ok ... as an article reader, though exceedingly weak on management / search / organisation (typical within the field, it seems). The Readability library as best I can tell is maintained/developed by Mozilla now, and is used by Readability for providing simplified webpage views.

I'll see what options for file management I can find. That's another ... annoyingly persistent deficiency on Android best I can tell.

Again thanks for a great browser and your time.

plateaukao commented 2 years ago

@dredmorbius Thanks for the hint. After turning off javascript, I can read the NYT article now. I found the image is lazy loaded; it's done by javascript after the page is finished loading. That's why it's not working for epub saving feature, because I extract image src url from content, and download them one by one. If the website does not put image url at the right place, or use other ways to show it in browser, I won't be able to parse them correctly. :)

dredmorbius commented 2 years ago

@plateaukao Thanks for looking into that.

I was afraid this might be the case. And yes, NYT do break standards in annoying ways, as does much of the Web these days 😞

plateaukao commented 2 years ago

close due to it's suggestion, and NYT case cant be easily fixed from browser side.