gottfrois / link_thumbnailer

Ruby gem that fetches images and metadata from a given URL. Much like popular social website with link preview.
MIT License
511 stars 105 forks source link

link_thumbnailer takes too much time to load #57

Closed mihir-kumar-thakur closed 9 years ago

mihir-kumar-thakur commented 9 years ago

It takes too much time to load and some times gives the timeout error.

gottfrois commented 9 years ago

Please give me an URL as an example. Otherwise I won't be able to help you

mihir-kumar-thakur commented 9 years ago

it is on localhost i am developing a rails app

mihir-kumar-thakur commented 9 years ago

where is your code base for the demo app ....

mihir-kumar-thakur commented 9 years ago

here is my code in the view file

                <% @posts.each do |post| %>
                <div class="col-md-4 column_for_the_thumbnail">
                    <div class="thumbnail">
                        <% link1 =  post.post_link %>
                        <% if link1 %>
                <% object = LinkThumbnailer.generate(post.post_link, verify_ssl: false, image_limit: 1, attributes: [:images], description_min_length: 0, image_stats: false).images.first.src.to_s %>
                        <% else %>
                        <% object = '' %>
                        <% end %>
                        <%= image_tag (object),class: "img-thumbnail",id: "image_thumb", style: "width:300px;height:200px", alt: "Image Not Found!" %>
                        <div class="caption">
                            <h5>
                                <%= post.title[0..35]%>...
                            </h5>
                            <span>
                                <%= link_to "Show", post_path(post), class: "btn btn-primary" %>
                            </span>
                        </div>
                    </div>
                </div>
                <% end %>

i am taking @post from the model and generating the thumbnail from the url in @post .

gottfrois commented 9 years ago

Sounds great but give me an example of a URL to scrap :)

Here is the demo https://github.com/gottfrois/link_thumbnailer_demo using the api https://github.com/gottfrois/link_thumbnailer_api

aruprakshit commented 9 years ago

@gottfrois yes, it is little bit slow. I tried this. It took time to load 15.31 seconds.

aruprakshit commented 9 years ago

@MihirKumarThakur Source code is added to the Readme file.

gottfrois commented 9 years ago

It seems to be pretty slow due to image stats. You can disable them if you don't care about image size and type but only care about the URLs:

LinkThumbnailer.generate('https://pragprog.com/book/mskanban/real-world-kanban', image_stats: false)

Also the verify_ssl option is true by default, you might want to disable it as well.

aruprakshit commented 9 years ago

@gottfrois Yes, tried with verify_ssl: false, nothing improved. I also tried image_stats: false, and it made the scraping very fast, but the image is different from the one I got, when image_stats was set to true.

gottfrois commented 9 years ago

Hum yes it's because the gem is not able to sort the images based on their size anymore. So the images returned are in order of appearance on the page to scrap

aruprakshit commented 9 years ago

@gottfrois I'll try to look into the code. let's see If i can add any value there or not.. :)

gottfrois commented 9 years ago

Feel free to do so but unfortunately I don't see a way around this. Image size are gathered over http requests which is what makes it slow. By passing this, it reduce drastically the scraping time but then you can't compare images by size anymore :(

A not easy to implement solution would be to fetch images size in parallel using typhoeus gem for example in order to make concurrent HTTP requests instead of one by one.

aruprakshit commented 9 years ago

@gottfrois +1

gottfrois commented 9 years ago

Just to let you know guys, I am working on a solution for this. Stay tune

gottfrois commented 9 years ago

I replaced FastImage gem by my own version called ImageInfo that allow to fetch images concurrently. My own benchmark shows a page that used to take about ~4.5 second to load now takes less than a second.

The fix is available now in v3.0.2.

Can you guys try in out and let me know if it improve your use cases? Thx

aruprakshit commented 9 years ago

@gottfrois I have updated my Gem. For some links it is hanging, and not giving any result.

Started GET "/posts/link_thumbnailer?url=http%3A%2F%2Fwww.nolo.com%2Flegal-encyclopedia%2Fdivorce-do-you-need-lawyer-29502.html" for 127.0.0.1 at 2015-08-12 12:27:20 +0530
Processing by PostsController#link_thumbnailer as */*
  Parameters: {"url"=>"http://www.nolo.com/legal-encyclopedia/divorce-do-you-need-lawyer-29502.html"}
  Account Load (0.5ms)  SELECT  "accounts".* FROM "accounts" WHERE "accounts"."deleted_at" IS NULL AND "accounts"."id" = $1  ORDER BY "accounts"."id" ASC LIMIT 1  [["id", 2]]
ETHON: Libcurl initialized
ETHON: started MULTI
gottfrois commented 9 years ago

Thanks, i will take a look. Can you please create a new issue for this one?

aruprakshit commented 9 years ago

@gottfrois Sure, I will.