haroldtreen / epub-press-clients

📦 Clients for building books with EpubPress.
https://epub.press
GNU General Public License v3.0
590 stars 73 forks source link

Image extraction for sites requiring authentication #29

Open asmyers opened 6 years ago

asmyers commented 6 years ago

I would like to enable image extraction on sites which require authentication. Currently the server fails downloading images and I get an epub with only the content in the HTML. Looking through the code it looks like my options would be either:

  1. Port the book generation portion of the back end into the browser plugin and build the book in browser.
  2. Have the browser plugin send all images in the HTML to the back end.

I think I could probably come up to speed and work on this but would appreciate some direction and any ideas you might have.

haroldtreen commented 6 years ago

Hey @asmyers !

It's a known issue that would be good to figure out a solution to. Some thoughts:

Porting Book Generation To The Browser

I've generally been wary of this. Browser plugins are good for manipulating the browser/dom, but creating and writing files seems a bit beyond scope. Maybe though? I haven't looked a lot into it.

You might also need to find a javascript library for creating .mobi files. Currently the .mobi files are just .epubs converted by a 3rd party binary. You can't execute random files in chrome so that would break.

Email delivery would also break. That's currently enabled by integration with a 3rd party service. You might be able to create similar functionality but it would require each user to supply a unique email/password. So non ideal.

Have the browser send all images

This could be an option? Currently the communication with the server is all a single json object. I suppose you could serialize an image into json? But that might make the request body really big. The backend would then need to save those images somewhere and point to them when it came time to create the ebook.

Have the extension send cookies

This could vary from site to site... but a more lightweight option could be to send the session cookies. Then whenever EpubPress requests images it just uses the same cookies that are set in your browser.

There's security implications with this, so probably not a viable idea, but maybe something people would be comfortable doing when running the service locally.

Those are the things that come to mind when I try and think of solving this problem. Let me know if other context would be useful!