Images from internet are not downloaded

GoogleCodeExporter commented 9 years ago

HTR:

1. wget html page to Your computer
2. run html2fb.py -i my_downloaded.html
3. get an error message because image from internet cannot be opened as filename

I've written a patch to fix this problem. Please, include it into upstream if 
possible.

Original issue reported on code.google.com by trousev....@gmail.com on 30 Jan 2014 at 7:00

Attachments:

download_images_from_web.patch

GoogleCodeExporter commented 9 years ago

Thanks for taking the time to create and submit a patch. It is nice to know 
other people are still using this!

I had a quick look at the patch and the else block appears to be empty 
(incorrect indentation). Does this run for both local and remote images?

One alternative to using wget with no params would be to use "wget -p -k", this 
both pulls down dependencies (like images) and also re-writes the URLs to 
local, so the unpatched tool should work.

Adding urllib/file-open calls is reasonable but I wasn't clear this patch 
worked. It would also be a good idea to do this for all file IO for consistency 
rather than just for images.

Original comment by clac...@gmail.com on 1 Feb 2014 at 6:51

GoogleCodeExporter commented 9 years ago

Hello!

> Thanks for taking the time to create and submit a patch. It is nice to know 
other people are still using this!

I'm writing converter of my RSS feed into FB2 book and I'm using Your solution 
as submodule. So, You can expect other patches to upstream :)

> I had a quick look at the patch and the else block appears to be empty 
(incorrect indentation). Does this run for both local and remote images?

Yes, it does. Patch was created with git diff tool and it can really have 
problems with intendation. Here is my part of code:

Here is My code in my editor

            if "http://" in image_filename or "https://" in image_filename:
                f = urllib2.urlopen(image_filename)
                data = f.read()
            else:
                f=open(image_filename, 'rb')
                data = f.read()
                f.close()

You can browse my fork at github: https://github.com/trousev/html2fb2

==============================================

> One alternative to using wget with no params would be to use "wget -p -k", 
this both pulls down dependencies (like images) and also re-writes the URLs to 
local, so the unpatched tool should work.

Well. I don't think this is a perfect solution (especially for windows users, 
who don't have wget) because this adds dependencies to the project.

> Adding urllib/file-open calls is reasonable but I wasn't clear this patch 
worked. It would also be a good idea to do this for all file IO for consistency 
rather than just for images.

Yes, I think it should be done for all file IO, of course, but I don't know 
code well enogh to do it well. Can You provide some docs/guidance?

Original comment by trousev....@gmail.com on 1 Feb 2014 at 8:50

webmedic / html2fb

Images from internet are not downloaded #2