Closed webcoyote closed 9 years ago
Looks like a bug! I will look into it
Are ACLs enabled? Could you share the configuration?
ACLs are not enabled. Here's the script to run consul:
GOMAXPROCS=2 "$SCRIPT_DIR/../../bin/consul" agent \
-bind=127.0.0.1 \
-bootstrap \
-server \
-config-dir "$SCRIPT_DIR/consul.d" \
-data-dir "$SCRIPT_DIR/data" \
-ui-dir "$SCRIPT_DIR/ui" \
#
Configuration directory contains one file:
{
"service": {
"name": "MatchSrv",
"tags": ["game"],
"port": 9001,
"check": {
"name": "MatchSrv status",
"ttl": "15s"
}
}
}
@webcoyote I just pushed 8a1969cc8cf4585438630e051e6fcc24fc7e908f, which should fix this. Do you think you could give it a try with a build from master?
@webcoyote Have you had a chance to try again on the master build?
Thanks for your help building consul. I'm running master with 8a1969c and haven't encountered any problems so far. Unfortunately I've been unable to recreate the bug with the old version (4.1) either -- it was a one-time occurrence.
Incidentally, when I ran the new version it was necessary to delete the data folder to prevent "==> Error starting agent: Failed to start Consul server: Failed to start Raft: MDB_INVALID: File is not an MDB file". Is that expected behavior?
Hmm that is definitely not expected behavior. You experienced this from upgrading from 0.4.1 to the master build? That is super strange. Was the 0.4.1 an official build?
Yes, I was using the 0.4.1 official build for Windows. Note that the MDB_INVALID error is bidirectional:
rm data-directory; run old-consul; run new-consul => error
rm data-directory; run new-consul; run old-consul => error
Awesome, thanks for the heads up. I'll need to investigate a possible regression.
@webcoyote Can you confirm the new versions were never running at once? Working with the LMDB developer, I think the issue is upstream. Apparently concurrent processes can break this.
Looks like you used two different LMDB versions and the offset of the version or magic number changed in the meta page.
Has nothing to do with maxreaders setting.
@hyc We've been pinned to 0.9.11 for quite some time (May 30th). Not sure why this would be happening now that we touched that setting.
Does this mean upgrading to LMDB 1.0 will also break all existing installs?
Can you confirm the new versions were never running at once?
I was only running one at a time.
@armon - should not have happened, but changing compile environment might have affected it.
LMDB 1.0 will break 0.9 compatibility, yes. The on-disk data layout will change. It has not (intentionally) changed so far.
@hyc Is there a compatibility promise with the 1.0 release? How do you guys handle the upgrade process for OpenLDAP?
All minor versions within a major version will be compatible. I doubt the disk format will change often. We're changing it in 1.0 to support incremental backup. There aren't any other new features envisioned that will require further changes.
Meanwhile, I'd be curious to see why your windows builds are having this problem, if all your builds are on 0.9.11
@webcoyote Do you think you could provide @hyc with copies of the raft/ data directory after running it with each version? That may be of use in determine what changed
re: handling changes in OpenLDAP - the docs already say to use slapcat/slapadd to migrate btw versions. We added mdb_dump/mdb_load in 0.9.14 to handle migrations in LMDB.
@hyc I don't see any docs for those two here: http://symas.com/mdb/doc/group__mdb.html. Also seems the docs are for 0.9.14 not 1.0. Where is the best place to look?
Never mind, I see its a CLI tool and not an API
Do you think you could provide @hyc with copies of the raft/ data directory after running it with each version? That may be of use in determine what changed.
Yes. Since he doesn't have a public email address I'll mail it to you, @armon.
@webcoyote It looks like you must have built on a 64bit platform. Our official Windows builds are 32bit, so it looks like the mis-match there was causing the issue. (Thanks to @hyc for diagnosing).
Looks like we can close this one down!
I noticed a bug that causes my application to have high CPU utilization because it continuously polls consul, which is returning a zero value for X-Consul-Index.
I'm running the latest consul (64-bit on Windows)
BUG: I query a folder and get X-Consul-Index=0
I query the individual keys and get valid results
I deleted a key in a totally unrelated folder and that fixed the problem:
Is this expected behavior or is my application "doing it wrong". My goal is to long poll query a list of keys (I don't need values) to discover which services are disabled, so I believed it would be reasonable to perform a "key" query.
Note that I've simplified the above results by removing the Date, Content-Type and Content-Length headers, but they were all legitimate values.