mozilla / api.webmaker.org

Services for Webmaker
https://api.webmaker.org
16 stars 14 forks source link

Thumbnail service is down #194

Closed xmatthewx closed 8 years ago

xmatthewx commented 8 years ago

Thumbnail service went down last week and stayed down through the weekend. No obvious source to the problem.

What's our plan? What might help unblock this? We will hold our most recent release in Beta until this is resolved.

cc @gvn @alanmoo

cadecairos commented 8 years ago

The service isn't down, but for some reason blitline is not uploading images to our s3 bucket like it should.. I need to get access to blitline but I don't know who has those credentials.

cc/ @simonwex

xmatthewx commented 8 years ago
cadecairos commented 8 years ago

Looks like Blitline jobs are failing silently. I'll investigate and/or file a support request with Blitline

jbuck commented 8 years ago
Image results -> [
    {
        "error":"Image processing failed. Screenshot attempt for 'https://beta.webmaker.org/#/thumbnail?user=576818&project=56876&page=169383&t=1453835316719' failed,
         please check url to make sure it's available. This incident has been logged and will be looked into...",
        "failed_image_identifiers":[
            "screenshot:mobile-center-cropped/small/webmaker-desktop/aHR0cHM6Ly9iZXRhLndlYm1ha2VyLm9yZy8jL3RodW1ibmFpbD91c2VyPTU3NjgxOCZwcm9qZWN0PTU2ODc2JnBhZ2U9MTY5MzgzJnQ9MTQ1MzgzNTMxNjcxOQ=="
        ]
    }
]
jbuck commented 8 years ago

There was a deploy of webmaker-desktop 11 days ago, which could have caused this to stop working: https://github.com/mozilla/webmaker-browser/compare/2459b9f...76737ba

jbuck commented 8 years ago

26-Jan-2016 19:20 GMT Job ID: 2xbFhmhpj9wqmKmHklKU8oA

Duration -> 9.4207 seconds Job Info -> {"v":1.2,"application_id":"...","src":"https://beta.webmaker.org/#/thumbnail?user=576822&project=56879&page=169389&t=1453836033047","src_type":"screen_shot_url","src_data":{"viewport":"320x440","delay":10000},"functions":[{"name":"crop","params":{"x":0,"y":121,"width":320,"height":198},"functions":[{"name":"resize_to_fit","params":{"width":320},"save":{"quality":90,"image_identifier":"screenshot:mobile-center-cropped/small/webmaker-desktop/aHR0cHM6Ly9iZXRhLndlYm1ha2VyLm9yZy8jL3RodW1ibmFpbD91c2VyPTU3NjgyMiZwcm9qZWN0PTU2ODc5JnBhZ2U9MTY5Mzg5JnQ9MTQ1MzgzNjAzMzA0Nw==","s3_destination":{"bucket":"webmaker-screenshot-production-s3bucket-4rdpjgx625kg","key":"mobile-center-cropped/small/webmaker-desktop/aHR0cHM6Ly9iZXRhLndlYm1ha2VyLm9yZy8jL3RodW1ibmFpbD91c2VyPTU3NjgyMiZwcm9qZWN0PTU2ODc5JnBhZ2U9MTY5Mzg5JnQ9MTQ1MzgzNjAzMzA0Nw=="},"save_profiles":true,"blitline_id":"1gMegxOEpkT9X9wdXLnOSMQ"}}]}],"user_id":"6CLtvsTUuqHR3KnezcdkzmA","type":"temp_urls","version":2,"q_ts":"1453836033.2616954"} Original Photo Metadata -> null Image results -> [ { "error":"Image processing failed. Screenshot attempt for 'https://beta.webmaker.org/#/thumbnail?user=576822&project=56879&page=169389&t=1453836033047' failed, please check url to make sure it's available. This incident has been logged and will be looked into...", "failed_image_identifiers":[ "screenshot:mobile-center-cropped/small/webmaker-desktop/aHR0cHM6Ly9iZXRhLndlYm1ha2VyLm9yZy8jL3RodW1ibmFpbD91c2VyPTU3NjgyMiZwcm9qZWN0PTU2ODc5JnBhZ2U9MTY5Mzg5JnQ9MTQ1MzgzNjAzMzA0Nw==" ] } ]

jbuck commented 8 years ago

I sent another email to Blitline support

cadecairos commented 8 years ago

Why might the screenshot attempt fail...? they're using phantomjs right?

xmatthewx commented 8 years ago

Thanks @jbuck for digging into this. The page seems to render fine. Looking at performance, there's a bit of a delay after page load before content is loaded, but it doesn't seem significant.

Do we pay for this service?

jbuck commented 8 years ago

Yeah, we pay for it. They're looking into it now

xmatthewx commented 8 years ago

They should credit us 2 weeks and explain the delayed response. Let me know if you want me to write a firm note – it's a favorite hobby of mine. :smiling_imp:

cadecairos commented 8 years ago

considering the volume of failed requests in the last couple weeks, I'm surprised they hadn't noticed this themselves.

jbuck commented 8 years ago

Fix should be pushed in a few hours:

Jon,

Good news... we can make it start working (regardless of the issues) right away. (We should be able to deploy a fix within a few hours).

I think this is an issue from our side:

We moved from using phantomjs 2.0 back to 1.9 due to severe segfault/stability issues due to a linux security update (I think it had to do with libpng). In turn, it seems that probably started causing your failures because 1.9 seems to be complaining about “Network not available” when we try to connect to your URL. (With PhantomJS, this is not necessarily indicative of a network issue, it may very well just be some internal bug that causing it to fail).

Regardless, we still have phantom 2.0 available on our workers, it’s just not used by default. So, we are going to add it as a fallback to 1.9 if 1.9 errors out. This DOES fix the problem, at least for the job in question (2xbFhmhpj9wqmKmHklKU8oA ).

So, apologies for the problems, and we will have a fix pushed within 4 hours or so.

jbuck commented 8 years ago

I suspect its because phantomjs 1.9 doesn't support TLS 1.0: https://github.com/ariya/phantomjs/issues/12165

cadecairos commented 8 years ago

Thanks for digging into this!

jbuck commented 8 years ago

Working again! @xmatthewx Wanna check with a new project?

xmatthewx commented 8 years ago

Science Fox says "yes! confirmed."

screenshot 2016-01-26 17 38 02
xmatthewx commented 8 years ago

@cadecairos - you have a script for generating thumbs right? Can we run it for the past two weeks?

xmatthewx commented 8 years ago

@jbuck - Thanks again. How can we monitor this service for future outages?

xmatthewx commented 8 years ago

Filing bugs to regenerate thumbs and monitor the service. Closing this!

https://github.com/MozillaFoundation/mofo-devops/issues/251