nikhil1raghav / kindle-send

Send webpages, documents and bookmarks to kindle
GNU Affero General Public License v3.0
210 stars 25 forks source link

Kindle-send breaks on svg images #28

Closed volkerwestphal closed 1 year ago

volkerwestphal commented 1 year ago

Describe the bug Kindle-send breaks the download if the webpage includes a .svg image.

To Reproduce Steps to reproduce the behavior: kindle-send download https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work

results in

SKIPPING https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work : Error retrieving "https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg" from source: ....
cannot get file, bad return code ...  missing data prefix

Expected behavior In an ideal world, kindle-send would convert the svg into a supported image format and deliver the epub. In the real world, it's sufficient to skip the image and continue the job.

Versions

Nikhil and Mattias, thank you for creating kindle-send.

przemekd commented 1 year ago

@volkerwestphal I can see the root cause of this error is a redirection (from https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work to https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work/, notice the slash at the end after the redirection). If you use kindle-send download https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work/ command instead downloading that image works.

A separate problem is that on many e-book readers there is only a limited set of media files supported. Related issue is #27.

@nikhil1raghav do you think kindle-send should have a capability to convert svg, webp, etc. files to more common formats? Or maybe that's something that https://github.com/bmaupin/go-epub should support?

volkerwestphal commented 1 year ago

Well, redirects are daily business on the web. These are fairly easy to handle and should not disturb any robust program.

Regarding the limited media type support, I think it's unreasonable to expect a small tool like kindle-send to support a myriad of formats. The basic formats are fine and cover the majority of content. However, kindle-send should not break when it comes across unsupported media. Simply skip over it and proceed.

What's the point? While surfing (mostly HN) I often stumble upon long and interesting articles on the web. I put these links in a file and carry on. Once in a while I start kindle-send -linkfile ... to create a compilation epub of these article. I don't care if a single article is missing a fancy artwork. But it bugs if kindle-send stops working the list because of a single link.

przemekd commented 1 year ago

Well, redirects are daily business on the web. These are fairly easy to handle and should not disturb any robust program.

Yep, I fully agree here. But they do make some images not available in the final documents. I've checked the code and it seems that go-readability is here to blame. Let me create an issue on their repo.

I don't care if a single article is missing a fancy artwork.

Sure, but would you accept a PR to add an optional conversion capabilities to allow Kindle users to see more images in their articles prepared and sent by kindle-send? ;)

volkerwestphal commented 1 year ago

would you accept a PR to add an optional conversion capabilities ...

This is something to ask @nikhil1raghav.

Adding more image formats doesn't solve the actual problem. It means adding code that is only rarely used. You never get to support all image formats out there.

However, updating kindle-send to handle unknown media in a graceful way sets it safe for years to come. You can always add important formats later if the necessity arises.

przemekd commented 1 year ago

Sure, I meant to tag @nikhil1raghav

BTW the last release does exactly that:

In the real world, it's sufficient to skip the image and continue the job.

In my case despite the error the output file is produced.

kindle-send download https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work
Loaded configuration
Fetched https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work --> rsync, article 3: How does rsync work?
No title supplied, inheriting title of first readable article : rsync, article 3: How does rsync work? 
Embedding images in  rsync, article 3: How does rsync work?
Downloading Images
Couldn't add image https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg : Error retrieving "https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg" from source: Error retrieving "https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg" from source: 
 stat https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg: no such file or directory
 cannot get file, bad return code
 missing data prefix
Downloading Images
Downloaded image https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work/2022-05-29-rsync-exo1-backup_hua2f50278895cfbee4dc18c7ea60b6d4a_2093260_600x0_resize_q75_box.jpg
Setting img src from https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work/2022-05-29-rsync-exo1-backup_hua2f50278895cfbee4dc18c7ea60b6d4a_2093260_600x0_resize_q75_box.jpg to ../images/img8984686547664710865.png 
Added 1 articles
Downloaded 1 files :
1. rsync, article 3: How does rsync work?.epub
volkerwestphal commented 1 year ago

Also did a retest, still using 2.0.0-rc1, still on Windows. No output file is produced:

C:\Users\....\Kindlesend>kindle-send-2.0.0-rc1.exe download https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work
Loaded configuration
Fetched https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work --> rsync, article 3: How does rsync work?
No title supplied, inheriting title of first readable article : rsync, article 3: How does rsync work?
Embedding images in  rsync, article 3: How does rsync work?
Downloading Images
Downloaded image https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg
Downloading Images
Downloaded image https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work/2022-05-29-rsync-exo1-backup_hua2f50278895cfbee4dc18c7ea60b6d4a_2093260_600x0_resize_q75_box.jpg
Setting img src from https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg to ../images/img2669478670900678324.png
Setting img src from https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work/2022-05-29-rsync-exo1-backup_hua2f50278895cfbee4dc18c7ea60b6d4a_2093260_600x0_resize_q75_box.jpg to ../images/img8984686547664710865.png
Added 1 articles
SKIPPING https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work : Error retrieving "https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg" from source:
 open https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg: Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch.
 cannot get file, bad return code missing data prefix
Downloaded 1 files :

(The german text basically gives the same message as the line below it.)

Along with your screenshot (most probably taken on Linux) points the problem in another directory: Under Windows, you can't have a file name with a question mark in it.:

C:\Users\...\Kindlesend>echo > "How does rsync work?.epub"
Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch.

No file was created. There at least nine characters invalid for use in file names under Windows, along with a couple of invalid strings.

Sources:

It looks like kindle-send uses the title of the website as a filename without sanitizing. In this case, the embedded svg image itself is not the root cause of this problem.

przemekd commented 1 year ago

@volkerwestphal I've created a new issue #29 that represents what happens here. I am not really sure if @nikhil1raghav is still around maintaining this repo. I've created a fork to fix some bugs on my own. I also created a release that should fix this file naming problem. You can test it out.

nikhil1raghav commented 1 year ago

@przemekd will be glad to merge your fix. Right now not getting much time to fix the bugs. PRs are always welcome. Thanks for fixing this.

przemekd commented 1 year ago

@nikhil1raghav Great! Let me prepare PRs to fix some of these issues I've already spotted. I'll get back to you soon. And thanks a lot for this little tool, it's very handy!

volkerwestphal commented 1 year ago

I close this issue with the following insights: