504 on upload - Githubissues

rnewman commented 9 years ago

Just saw this on device, so I don't have a req/resp snapshot for you.

03-05 11:41:07.701 W/FxReadingList(18267): fennec_rnewman :: ReadingListClient :: Upload got failure response 504
03-05 11:41:07.703 W/FxReadingList(18267): fennec_rnewman :: ReadingListSynchronizer :: Upload failed.

504, no useful body. Should be a bunch in your logs. The device is still on, so will keep reproing for a couple of hours!

almet commented 9 years ago

Hi, this is probably due to nginx not allowing enough time for the client to upload its records.

I believe this was due to having too big content. Do you know exactly which request is causing this?

rnewman commented 9 years ago

503 for a single record, and also for downloading it.

03-05 15:58:38.176 I/FxReadingList(24758): fennec_rnewman :: ReadingListClient :: Uploading new record: {"is_article":false,"added_on":1425584258166,"resolved_title":"A study of twins shows that autism is largely genetic","added_by":"Fennec rnewman on Nexus 7","read_position":0,"url":"http:\/\/loonylabs.org\/2015\/03\/04\/autism-spectrum-disorder-twins-genetics\/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+ResearchBloggingNeuroscienceEnglish+(Research+Blogging+-+English+-+Neuroscience)","title":"A study of twins shows that autism is largely genetic","word_count":0,"excerpt":"In the fight against misinformation about autism it seems science is starting to come out on top, finally. A new study hopes to add to the recent advancements made in the understanding of autism, which finds that a substantial genetic and moderate environmental influences were associated with risk of autism spectrum disorder (ASD) and broader autism\u2026","favorite":false,"resolved_url":"http:\/\/loonylabs.org\/2015\/03\/04\/autism-spectrum-disorder-twins-genetics\/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+ResearchBloggingNeuroscienceEnglish+(Research+Blogging+-+English+-+Neuroscience)","stored_on":null,"_id":5,"archived":false,"unread":true}
03-05 15:58:38.177 D/FxReadingList(24758): fennec_rnewman :: BaseResource :: HTTP POST https://readinglist.dev.mozaws.net/v1/articles
03-05 15:58:38.177 D/FxReadingList(24758): fennec_rnewman :: BaseResource :: Added auth header.
03-05 15:58:38.182 D/FxReadingList(24758): fennec_rnewman :: BaseResource :: I/O exception returned from execute.
03-05 15:58:38.182 D/FxReadingList(24758): fennec_rnewman :: BaseResource :: Retrying request...
03-05 15:58:38.674 D/FxReadingList(24758): fennec_rnewman :: BaseResource :: Response: HTTP/1.1 503 Service Unavailable: Back-end server is at capacity
03-05 15:58:38.674 W/FxReadingList(24758): fennec_rnewman :: ReadingListClient :: Upload got failure response HTTP/1.1 503 Service Unavailable: Back-end server is at capacity
03-05 15:58:38.677 D/FxReadingList(24758): fennec_rnewman :: ReadingListClient :: No response body.
03-05 15:58:38.677 D/FxReadingList(24758): fennec_rnewman :: ReadingListSynchronizer :: New items uploaded. Flushing resultant changes.
03-05 15:58:38.679 D/FxReadingList(24758): fennec_rnewman :: ReadingListSyncAdapter :: Step: onNewItemUploadComplete
03-05 15:58:38.680 I/FxReadingList(24758): fennec_rnewman :: ReadingListClient :: Getting all records from https://readinglist.dev.mozaws.net/v1/articles?_since=1425582942146
03-05 15:58:38.680 D/FxReadingList(24758): fennec_rnewman :: BaseResource :: HTTP GET https://readinglist.dev.mozaws.net/v1/articles?_since=1425582942146
03-05 15:58:38.680 D/FxReadingList(24758): fennec_rnewman :: BaseResource :: Added auth header.
03-05 15:58:38.728 D/FxReadingList(24758): fennec_rnewman :: BaseResource :: Response: HTTP/1.1 503 Service Unavailable: Back-end server is at capacity
03-05 15:58:38.728 D/FxReadingList(24758): fennec_rnewman :: ReadingListClient :: Got non-success record response 503
03-05 15:58:38.729 D/FxReadingList(24758): fennec_rnewman :: ReadingListClient :: No response body.
03-05 15:58:38.729 D/FxReadingList(24758): fennec_rnewman :: ReadingListClient :: No response body.
03-05 15:58:38.731 E/FxReadingList(24758): fennec_rnewman :: ReadingListSynchronizer :: Download failed. since = 1425582942146. Response: 503

almet commented 9 years ago

504 or 503 ? Is it two different issues?

rnewman commented 9 years ago

Two. I was getting a 504 (with a timeout) for an hour or so, then by the time I'd added more logging after going to the gym, it had turned into an instant 503.

ckarlof commented 9 years ago

It seems to be alive again.

@ametaireau, can you add @jrgm's public key to that server so we can provide "extended support hours" for that server?

https://github.com/mozilla/identity-pubkeys/blob/master/jrgm.pub

almet commented 9 years ago

Yeah, I've reset nginx in there, not sure what was hapenning actually, I'll be looking at the logs there tomorrow.

almet commented 9 years ago

@ckarlof thanks, I've added @jrgm public key there.

You know where to connect?

ckarlof commented 9 years ago

@ametaireau is it ec2-54-149-21-166.us-west-2.compute.amazonaws.com?

ckarlof commented 9 years ago

@ametaireau what user name does he need to log into?

almet commented 9 years ago

In my .ssh/config:

Host loop-dev
    HostName ec2-54-68-145-165.us-west-2.compute.amazonaws.com
    User ec2-user

ckarlof commented 9 years ago

Thanks!

almet commented 9 years ago

Github chat ftw.

almet commented 9 years ago

I'm letting this issue open until we find out what's going on here. I believe the 504 caused the 503 because it somehow killed our app, not sure. Will check tomorrow.

almet commented 9 years ago

Actually, the ssh-config isn't the right one, this is loop-dev, not readinglist-dev.

the right one:

Host readinglist-dev
    User ubuntu
    HostName 54.149.21.166

rnewman commented 9 years ago

Now I'm back to a 504.

03-05 16:23:53.251 I/FxReadingList(26251): fennec_rnewman :: ReadingListSyncAdapter :: Reading list sync done.
03-05 16:23:53.399 D/FxReadingList(26251): fennec_rnewman :: BaseResource :: Response: HTTP/1.1 504 Gateway Timeout
03-05 16:23:53.399 W/FxReadingList(26251): fennec_rnewman :: ReadingListClient :: Upload got failure response HTTP/1.1 504 Gateway Timeout
03-05 16:23:53.455 D/FxReadingList(26251): fennec_rnewman :: ReadingListClient :: No response body.

rnewman commented 9 years ago

Looks like a 60-second timeout.

Natim commented 9 years ago

What happended is that we only have a circus worker at the time so if your request is too consuming it will hang the worker until it finish and if the heartbeat is calling it in between it will turn off the server from the ELB as non responding. It means we really need to start scalling the dev server.

Natim commented 9 years ago

@rnewman could you provide us with the number of batch request your are sending?

rnewman commented 9 years ago

No batch requests. One HTTP request at a time, uploading a single record via POST. See my second comment.

rnewman commented 9 years ago

Note that the Android client doesn't use batching at all. Stefan's does.

Natim commented 9 years ago

Ok I have tracked down the problem. The readinglist dev box is really really small. (1 proc and have serveur, nginx and database running on it.) I will take some time to deploy a bigger instance.

Natim commented 9 years ago

We have redeployed everything and have been working to increase the perf of each node. You shouldn't have more problem on the production server.

Closing for now. I will get even better with the 1.4.x release.

mozilla-services / readinglist

504 on upload #155