Open CRCulver opened 7 years ago
Hrm, this:
POST /storage/1.5/1/storage/bookmarks?batch=true
Looks like it's trying to use the new "batch upload" API, but that's not supposed to be enabled in the self-hosted version yet. It should work, I'm just surprised to see it on. Can you try adding a [storage]
section to your .ini file, and including batch_upload_enabled = False
to it?
You may be able to get useful debugging info from your Firefox sync logs, by going to about:sync-log
. They often contain more details about e.g. the contents of a specific failing request.
How can better debugging be gained from syncserver
Unfortunately this can be very dependent on how you're running the server. The application has some calls to logging.info
and logging.error
and so-on, but getting them to output to stderr and then having stderr go somewhere useful depends a lot on the environment. How are you running the server?
Looks like it's trying to use the new "batch upload" API ... Can you try adding a [storage] section to your .ini file, and including batch_upload_enabled = False to it?
I added that section to my .ini file, but there is no change in behaviour: the POST request is still made by the client with a batch=true argument and it fails with a 400 Bad Request:
`
WSGI through Apache.
I don't have a lot of experience with this setup, but IIUC the way to enable more detailed logging is as follows.
First, I assume you're using the provided syncserver.wsgi
file, which contains a hook to configure python's logging module with sections read from the syncserver.ini
file:
So you can add options into the .ini
file to enable more verbose logging, in the format documented in [1]. Here's an example from one of our build pipeline configs:
If you can configure it to log at DEBUG
level and to write to stdout or stderr, then the log output should show up in whatever file you've configured for logging in your mod_wsgi config. Hopefully there will be some clues in there.
[1] https://docs.python.org/2/library/logging.config.html#logging-config-fileformat
Thank you for the detailed response. I enabled logging at the DEBUG level as you indicated. Unfortunately, it doesn’t provide me with any information besides some SQL processes, and it doesn’t give any clues to why the POST request should be answered with a 400 Bad Request:
[Mon Sep 11 04:51:19.912654 2017] [wsgi:error] [pid 30669] [remote
Unfortunately I can't think of a nice way to get more information out of the system here. I would probably fall back to the old "add a bunch of print statements" approach at this point.
If you're interested in going that route, the likely sources for this failure are the extraction and validation of batch-related parameters in this file:
https://github.com/mozilla-services/server-syncstorage/blob/master/syncstorage/views/validators.py
And any errors thrown in the route hander here:
I'm sorry to have to suggest that, but I'm out of other ideas :-(
I'm having same problem with history syncing - log shows batch=true param added to request. Just out of interest: is server somehow supposed to advertise, that it supports batch? Because otherwise how would Firefox know, that it is not supposed to use it. Unfortunately browser console does not show response content for requests, so I checking actual data would take some doing.
is server somehow supposed to advertise, that it supports batch?
Yes, there's a /info/configuration
endpoint that clients can fetch to determine various details of the server setup, including whether batch uploads are enabled.
Ok, this /info/configuration response was: {"max_request_bytes": 1048576, "max_record_payload_bytes": 262144} Could it be, that Firefox switches to batch sending, if request would be bigger, then those limits? Any way to change them in configuration?
From Firefox sync configuration: batch: indicates that uploads should be batched together into a single conceptual update. To begin a new batch pass the string ‘true’. To add more items to an existing batch pass a previously-obtained batch identifier. This parameter is ignored by servers that do not support batching.
But in syncserver validators.py:
# we want to:
# * silently ignore attempts to start a new batch, which
# will cause clients to fall back to non-batch mode.
# * error out on attempts to continue an existing batch,
# since we can't possibly do what the client expects.
It seems to me that there is a conflict between those two things. Unless I'm completely misunderstanding, how it is supposed to work.
On reflection, I think that the appearance of batch=true
in the requests here is OK. It's Firefox speculatively sending "batch=true" when it uploads, and then it will determine whether or not batching is available based on the server's response. If the server response indicates that it ignored batch=true
then the client will just do a non-batched upload.
So I think the batch=true
in the requests is probably a red herring here after all.
Having the same issue, and now blocks history and bookmarks sync for me. :(
I've found this hint: "If the server does not support batching, it will ignore the batch parameter and return a “200 OK” response without a batch identifier." Source: https://moz-services-docs.readthedocs.io/en/latest/storage/apis-1.5.html
So, even with batch=true, the server should not fail with an 400 bad request.
Found it. Sorta.
Directly below the 400 error, there's a "Uploading records failed: 6". Thats from syncstorage/tweens.py, pick_weave_error_code(). Originally, its an "Invalid JSON in request body" from syncstorage/views/validators.py, parse_multiple_bsos(), because json_loads(request.body) throws an ValueError. I have no idea what the contents of the request body are at that time. (because I have no idea what I'm doing here anyway.)
However, that lead me to the idea that somehow larger POSTs get crippled. I then reduced the maximum POST size (which is by default 2Mb), and found that the magic limit for me is somewhere between 6000 and 10000 bytes. The actual length is in the log file, as "POST Length".
With these settings added to syncserver.ini, syncing was working again:
[storage] max_request_bytes = 6000 max_post_bytes = 6000
The only thing I am sure, its not the apache config option LimitRequestBody, that one causes a 413 error.
Those limits fixed for me too.
However, that lead me to the idea that somehow larger POSTs get crippled. I then reduced the maximum POST size (which is by default 2Mb), and found that the magic limit for me is somewhere between 6000 and 10000 bytes. The actual length is in the log file, as "POST Length".
Aha, interesting, thanks for digging into this @urigg! I wonder if something in the stack is truncating incoming request bodies when they're too long, which produces invalid JSON, which triggers the 400 error.
If I had to guess, I'd guess that the magic limit here is 8kB. What do you think about us configuring default values of max_request_bytes
and max_post_bytes
in this repo, so that they fall just below that amount?
Different software stack, but very similar-sounding issue reported here:
https://stackoverflow.com/questions/48265984/request-body-truncated-at-8k
Also, are you using mod_default or similar in your apache config? I see a couple of old reports that this can produce truncated request bodies in various combinations, we might be facing something similar here.
I'm not sure how many installations actually have this issue, so no idea if its worth changing the defaults. Plus, I don't know whether its max_request_bytes or max_post_bytes that is helping here. As for mod_default, I don't even have a clue what that is, and couldn't find anything about it. I'm using Apache on Debian, with mod_wsgi.
Is there a way to include scenarios like this in automated tests? Right now I probably cannot even reproduce it any more, now that the blocking chunks were finally uploaded.
As for the truncated body: I tried to dump out whats in the body, but always got an 500 error, so maybe there wasn't even anything in there. One time I got an exception from deep inside the request that the body was X bytes too short, with X being the same as the size in the log files.
As for mod_default, I don't even have a clue what that is, and couldn't find anything about it
Sorry, that was a weird typo on my part - I meant mod_deflate
I have mod_deflate enabled on that server.
No deflate for me.
About request body: Not sure, but one call stack left the impression to me that this is some kind of lazy placeholder...? Could it be that the code may run before a huge request is completely received? Maybe there's a bug triggered somehow because the request body is still incomplete. (just speculating.)
I think I'm having the same issue, except that settings those limits seems to have no effect. The log says POST requests are still rather large. error-sync-1533569154562.txt I have the same setup as @CRCulver, self-hosted server behind apache with WSGI. Interestingly, on the same server, Firefox Mobile syncs just fine (it only has 3 default bookmarks as of now). Any suggestion?
Today I tried again and it worked, but I don't know which one of these made the trick:
In the end, the workaround works, but the root cause remains unknown
Similar problem here. I was able to mitigate it partially using
max_request_bytes = 8192
max_post_bytes = 8192
as proposed before.
But this doesn't resolve all errors during syncing. Now I'm running into the following client messages:
1549472918635 Sync.Engine.Prefs WARN Failed to enqueue record "e2VjODAzMGY3LWMyMGEtNDY0Zi05YjBlLTEzYTNhOWU5NzM4NH0=" (aborting): Error: Single record too large to submit to server (resource://services-sync/record.js:993:40) JS Stack trace: enqueue@record.js:993:40 _uploadOutgoing@engines.js:1759:43
1549472918636 Sync.Status DEBUG Status for engine prefs: error.engine.reason.unknown_fail
1549472918636 Sync.Status DEBUG Status.service: success.status_ok => error.sync.failed_partial
1549472918636 Sync.ErrorHandler DEBUG prefs failed: Error: Single record too large to submit to server (resource://services-sync/record.js:993:40) JS Stack trace: enqueue@record.js:993:40 _uploadOutgoing@engines.js:1759:43
Any suggestions on an improved workaround for the original issue?
1549472918636 Sync.Status DEBUG Status for engine prefs: error.engine.reason.unknown_fail
I'm surprised to see a "record too large" error for the prefs engine, but perhaps you've got an addon or something that's storing a large amount of data in synced prefs.
Unfortunately I think the bug here is upstream of syncserver, in whatever software you're using to proxy incoming requests. You could try connecting directly to the python server to confirm whether that fixes the issue, but I don't have any suggestions for how to tweak the rest of your server stack to avoid it :-(
I am running syncserver on my server and I am experiencing an error in 1.6.0 (which persists after cloning the latest git version) in which POST operations within sync'ing fail with a 400 "Bad Request" error, though GET requests succeed. My logs look like this: