Closed electrofelix closed 5 years ago
Anyone able to comment on whether this is considered acceptable? Any other alternatives that would work as a locking mechanism across threads in vagrant better than using a lock declared as a class variable?
@electrofelix, could you post a Vagrantfile with an example config?
Apologies seems I posted the example config on the bug rather than on the PR. Copying here.
Looks like something has changed that now prevents multiple instances of the server from being started without this patch, I'm using vagrant 1.8.7.
However I am seeing entries failing to be added to ~/.vagrant.d/data/landrush/hosts.json and missing entries from vagrant landrush list
in certain cases, so although the server is now starting once only there are other problems in the configuration being written out that need to be looked at as record_machine_dns_entry
calls to Store
appear to also not be thread safe.
Vagrantfile
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "debian/jessie64"
config.landrush.enabled = true
config.landrush.tld = 'ncs.net'
config.landrush.host_redirect_dns = false
# use my host resolvconf dnsmasq instance, probably not be required
# for systems that can reach the google dns servers instead
config.landrush.upstream '127.0.0.1'
config.vm.provider :libvirt do |libvirt, override|
override.vm.synced_folder ".", "/vagrant", type: "9p"
libvirt.memory = 4096
libvirt.cpus = 2
libvirt.disk_bus = "scsi"
libvirt.volume_cache = "unsafe"
libvirt.nested = true
libvirt.cpu_mode = "host-passthrough"
libvirt.driver = "kvm"
libvirt.host = "localhost"
libvirt.uri = "qemu:///system"
end
(1..10).each do |n|
config.vm.define "srv-#{n}" do |node|
node.vm.hostname = "srv-#{n}.ncs.net"
end
end
end
vagrant landrush stop
rm -rf ~/.vagrant.d/data/landrush/
vagrant up --parallel --provider libvirt
... <lots of output>
vagrant landrush list
# produced the following output
193.121.168.192.in-addr.arpa srv-3.test.net
srv-2.test.net 192.168.121.124
124.121.168.192.in-addr.arpa srv-2.test.net
srv-9.test.net 192.168.121.188
188.121.168.192.in-addr.arpa srv-9.test.net
srv-6.test.net 192.168.121.186
186.121.168.192.in-addr.arpa srv-6.test.net
srv-7.test.net 192.168.121.240
240.121.168.192.in-addr.arpa srv-7.test.net
srv-5.test.net 192.168.121.46
46.121.168.192.in-addr.arpa srv-5.test.net
Bringing the set up and down without removing the landrush data dir results in more getting populated, but it shows that there is a race in the configuration getting written out.
This does appear to be somewhat dependent on the host as I've had better results with some machines over the others, with the above being the worst case I've seen.
And I also wouldn't be too confident that the multiple attempts to start the daemon won't return unless the reason for it starting to behave itself can be tracked down as it's likely just timing is allowing things to work better now.
I've also seen this fail to remove entries after some destroys due to one of the entries being missing, the corresponding reverse lookup entry is not correctly removed.
Thinking about this further suggests that in order to ensure that should someone also be executing multiple vagrant instances, and edge case to be sure but, since the same fix of using a lockfile around updates for any Store
modifications and attempts to start the server would ensure any parallel bring up would be both multi-thread and multi-process safe.
I looked a bit at this. This would really only help for the vagrant up --parallel
case right. Locking works across threads, but not across processes. Potentially one could imagine to independent parallel startups of Vagrant machines. For that reason, I am wondering whether one should not just synchronize on the config file directly using something like Ruby's File.flock or to get a bit of a higher level abstraction the filelock gem.
That said, I have a hard time reproducing the problem. Even with --parallel
I don't get into a situation where Landrush config/start breaks due to sync issues. I am using:
Vagrant.configure(2) do |config|
config.vm.box = 'ubuntu/xenial64'
config.landrush.enabled = true
config.landrush.tld = 'example.com'
config.landrush.host 'static1.example.com', '1.2.3.4'
config.landrush.host 'static2.example.com', '2.3.4.5'
(1..3).each do |n|
config.vm.define "landrush-#{n}" do |node|
node.vm.hostname = "landrush-#{n}.example.com"
end
end
end
That said, synchronizing access is probably a good idea regardless.
Closing this pull request which is superseded by pull request #328
Use class variables to support configuration of the DNS server once per run for multi-machine environments.
Process level locking is done using a class variable containing a Mutex and a simple synchronization block, followed by testing whether another thread has already performed this configuration or not.
This allows a single attempt to bring up the DNS server, and ensures that for parallel bring up with multi-machine environments that one a single attempt is made to start the server, and therefore the correct pid will be available during destroy to cleanup.
Fixes #304