Closed codylane closed 4 years ago
@codylane - Picking up conversation here - additional info:
Bummer. Ok, let's keep digging into this as it may be a ruby thing.
I know how the tempfile support is supposed to work in C but I don't really understand the ruby version all that well. Let me dig a little in the ruby code base and see what I can grok/understand.
I'm not much of a windows guy but I'll try and find a windows 7 box and see if I can reproduce this issue. I'm starting to think this may affect every windows user.
Thanks for sharing the link to the ruby docs as well. This line is of interest to me
Explicit close
When a Tempfile object is garbage collected, or when the Ruby interpreter exits, its associated temporary file is automatically deleted. This means that's it's unnecessary to explicitly delete a Tempfile after use, though it's good practice to do so: not explicitly deleting unused Tempfiles can potentially leave behind large amounts of tempfiles on the filesystem until they're garbage collected. The existence of these temp files can make it harder to determine a new Tempfile filename.
Therefore, one should always call unlink or close in an ensure block, like this:
file = Tempfile.new('foo')
begin
# ...do something with file...
ensure
file.close
file.unlink # deletes the temp file
end
On POSIX systems, it's possible to unlink a file right after creating it, and before closing it. This removes the filesystem entry without closing the file handle, so it ensures that only the processes that already had the file handle open can access the file's contents. It's strongly recommended that you do this if you do not want any other processes to be able to read from or write to the Tempfile, and you do not need to know the Tempfile's filename either.
For example, a practical use case for unlink-after-creation would be this: you need a large byte buffer that's too large to comfortably fit in RAM, e.g. when you're writing a web server and you want to buffer the client's file upload data.
Please refer to unlink for more information and a code example.
On POSIX systems it's possible to unlink a file before closing it. This practice is explained in detail in the Tempfile overview (section “Unlink after creation”); please refer there for more information.
However, unlink-before-close may not be supported on non-POSIX operating systems. Microsoft Windows is the most notable case: unlinking a non-closed file will result in an error, which this method will silently ignore. If you want to practice unlink-before-close whenever possible, then you should write code like this:
file = Tempfile.new('foo')
file.unlink # On Windows this silently fails.
begin
# ... do something with file ...
ensure
file.close! # Closes the file handle. If the file wasn't unlinked
# because #unlink failed, then this method will attempt
# to do so again.
end
require 'tempfile'
def tempfile(content)
Tempfile.new("vagrant").tap do |temp|
temp.binmode
temp.write(content)
temp.close
end
end
I suspect the close is what's causing the problems for windows users. I have a feeling this may not be an atomic operation on windows. I'd love to setup a simulation that runs a script 1000 times and reports the number of times of pass and fail of each run. Let me see if I can whip something up.
I've not had much luck setting up Windows and I've lost a few hours to trying. So I'm wondering if you might be able to run this on you workstation for me and paste the results?
#!/usr/bin/env ruby
require 'date'
require 'json'
require 'tempfile'
$DEFAULT_RUNS = 1000
def tempfile(content)
Tempfile.new("vagrant").tap do |temp|
temp.binmode
temp.write(content)
temp.close
end
end
def now
DateTime.now.to_time.to_f
end
def simulate(runs=nil)
runs = runs || ARGV[0] || $DEFAULT_RUNS
runs = runs.to_i
metrics = {
:pass => 0,
:fail => 0,
:temp_files => [],
:iterations => runs,
:started_at => now,
:ended_at => now,
:run_duration => 0.0,
}
(1..runs).each do |index|
begin
tmp = tempfile("foo")
metrics[:pass] += 1
metrics[:temp_files] << {:filename => tmp.path, :exists? => File.exists?(tmp.path)}
rescue
metrics[:fail] += 1
end
end
metrics[:ended_at] = now,
metrics[:run_duration] = metrics[:ended_at] - metrics[:started_at]
metrics
end
puts JSON.pretty_generate(simulate())
You'll need to find the path to the vagrant ruby executable to run the code above and then use something like this to execute the test.
Since I'm a linux guy here's how I would do it on linux.
/opt/vagrant/embedded/bin/ruby test_tempfile.rb 10
This would run the simulation tests 10 times and report the temp file names and whether or not the temp file actually exists on disk when the function returns. From the docs I understand that the file will be deleted when the code goes out of scope, i.e. when either the garbage collected or when the program is terminated.
The output from the tests is a nice little JSON blob so that you could test for the file was actually deleted when the script terminates. I'll leave that for a task for you if you'd like to explore that. ;) I'll be happy to help if you'd like.
Here's what it looks like running from my MacOSX
{
"pass": 10,
"fail": 0,
"temp_files": [
{
"filename": "/var/folders/36/_92163xs6tq2dmhr5y6613rm0000gn/T/vagrant20200605-54071-1xk0tm7",
"exists?": true
},
{
"filename": "/var/folders/36/_92163xs6tq2dmhr5y6613rm0000gn/T/vagrant20200605-54071-1e0nbdx",
"exists?": true
},
{
"filename": "/var/folders/36/_92163xs6tq2dmhr5y6613rm0000gn/T/vagrant20200605-54071-knef9a",
"exists?": true
},
{
"filename": "/var/folders/36/_92163xs6tq2dmhr5y6613rm0000gn/T/vagrant20200605-54071-1fp5iy8",
"exists?": true
},
{
"filename": "/var/folders/36/_92163xs6tq2dmhr5y6613rm0000gn/T/vagrant20200605-54071-pvm5ad",
"exists?": true
},
{
"filename": "/var/folders/36/_92163xs6tq2dmhr5y6613rm0000gn/T/vagrant20200605-54071-1nga0om",
"exists?": true
},
{
"filename": "/var/folders/36/_92163xs6tq2dmhr5y6613rm0000gn/T/vagrant20200605-54071-1sfm2ah",
"exists?": true
},
{
"filename": "/var/folders/36/_92163xs6tq2dmhr5y6613rm0000gn/T/vagrant20200605-54071-1dfh1y2",
"exists?": true
},
{
"filename": "/var/folders/36/_92163xs6tq2dmhr5y6613rm0000gn/T/vagrant20200605-54071-12670yj",
"exists?": true
},
{
"filename": "/var/folders/36/_92163xs6tq2dmhr5y6613rm0000gn/T/vagrant20200605-54071-yrbwqc",
"exists?": true
}
],
"iterations": 10,
"started_at": 1591381460.020482,
"ended_at": [
1591381460.024221,
0.0017809867858886719
],
"run_duration": 0.0017809867858886719
}
@codylane - Tried out this script with both versions of embedded ruby and can't get it to fail, regardless of how many iterations I specify :frowning:
I'm able to reproduce the error pretty easily with vagrant up
or vagrant provision
, so at least it's consistently happening! I'll poke around more as I find time - if you have anything else you want me to test, let me know.
For reference, here are the ruby versions Vagrant is using:
# Vagrant 2.2.7
C:\HashiCorp\Vagrant\embedded\mingw64\bin\ruby --version
ruby 2.4.9p362 (2019-10-02 revision 67824) [x64-mingw32]
# Vagrant 2.2.8
C:\HashiCorp\Vagrant\embedded\mingw64\bin\ruby --version
ruby 2.6.6p146 (2020-03-31 revision 67876) [x64-mingw32]
I added a count of exists?()
returning true or false, and it's always matching with the script, for example:
{
...
"iterations": 50000,
"started_at": 1591623400.752482,
"ended_at": 1591623619.456982,
"run_duration": 218.70449995994568,
"pass": 50000,
"exists": 50000,
"fail": 0
}
Bummer, I was really hoping we see some failures. Thank you so much for testing this and providing the results. :)
Ok, I've got another thing we can test for and I wanted to first try using the same code that vagrant-proxyconf uses which was the test result you pasted above. That seems to be working nicely if you the running process is in the same scope.
For the next test, we will actually simulate what vagrant-proxyconf
does I'll create two separate *.rb
files one that contains the tempfile
code and one that consumes the code in a separate file which will also be our test runner. Hopefully we will see some failures from these new tests. Let me see what I can whip up.
Actually, let's scratch the testing and go straight into full debugging. I have a feeling this maybe one that is hard to track down through testing.
That said, are you familiar with debugging in ruby? If not, maybe this can help? You would have to install the pry
gem into the vagrant ruby for this to work. gem.exe install pry
and gem install rb-readline
would do the trick as long as you can find the path to gem.exe command.
Next we will add a few break points in the current version of the code and I assume you are using the latest version 2.0.7? If so, then the next hurdle is finding out where the heck vagrant-proxyconf
is installed on disk and then you'll add the following require 'pry'; binding.pry
before the line line. If all goes well, the next time you run vagrant provision
assuming the VM is already running or if the VM fires up you will drop right into the debugger. What you will be looking for is the current state of the variable @docker_systemd_config
which you should just be able to copy and paste in the debugger and hit enter and it should tell you it's a String and a File that exists on disk. You can test this by entering the following in the debugger. If you wouldn't mind posting the results of this session perhaps we can find the root cause and get in a fix.
@docker_systemd_config.class.to_s
File.exists?(@docker_systemd_config)
@codylane - I did the more low-tech version and just put a few logging statements in there :stuck_out_tongue:
...
@machine.communicate.tap do |comm|
###
logger.warn("Attempting docker systemd config upload => #{@docker_systemd_config.inspect}")
logger.warn("Does it exist? => #{File.exist?(@docker_systemd_config)}")
logger.warn("Opening #{@docker_systemd_config}")
my_file = File.open(@docker_systemd_config, "rb")
logger.warn("File size => #{my_file.size}")
logger.warn("Closing #{@docker_systemd_config}")
my_file.close
###
comm.sudo("mkdir -p /etc/systemd/system/docker.service.d")
comm.upload(@docker_systemd_config, "/tmp/vagrant-proxyconf-docker-systemd-config")
...
Initial findings are...confusing! I'm still playing around, but here is what it's showing for me for a failure:
==> default: Configuring proxy for Docker...
WARN proxyconf: Attempting docker systemd config upload => "C:/Users/D108059/AppData/Local/Temp/vagrant20200610-16676-7ntkda"
WARN proxyconf: Does it exist? => true
WARN proxyconf: Opening C:/Users/D108059/AppData/Local/Temp/vagrant20200610-16676-7ntkda
WARN proxyconf: File size => 293
WARN proxyconf: Closing C:/Users/D108059/AppData/Local/Temp/vagrant20200610-16676-7ntkda
ERROR warden: Error occurred: No such file or directory @ rb_sysopen - C:/Users/D108059/AppData/Local/Temp/vagrant20200610-16676-7ntkda
I'm thinking since we only get the string path of the tempfile, maybe it's being garbage collected suddenly before the upload? Going to keep a reference to it and see if it makes a difference...
@codylane - Based on some further testing, I think this all comes down to ruby garbage collecting the Tempfile
because we don't hold a reference to it.
Adding a reference to the Tempfile
instance seems to eliminate the failures after many runs trying to reproduce (which would usually happen at least every other run).
# Test keeping a tempfile ref
@docker_systemd_config_tempfile_ref = tempfile(systemd_config)
@docker_systemd_config = @docker_systemd_config_tempfile_ref.path
Maybe we were just lucky with ruby 2.4.9
, and GC changes/improvements up to 2.6.6
cause quicker cleanup and deletion of the files.
Excellent work finding the culprit, that should be a pretty easy fix. ;) Thank you for taking the time to report your findings as well.
I also have been suspecting GC or scoping issue and our simulation code for the tests was all running in the same scope and thus just working. I probably should have mentioned this but in the refactor I have a test fixture that I created that has a different tempfile implementation that should work on windows and it uses block scope instead of running process scope. So all my tests are executed in block scope and when that block goes out of scope the file is automatically deleted.
Here's the tests fixture code:
def create_tempdir(basename="", tmpdir=nil, mode: 0, **options)
tmpfile = Tempfile.new(basename=basename, tmpdir=tmpdir, mode: mode, **options)
begin
yield File.dirname(tmpfile.path)
ensure
tmpfile.close
tmpfile.unlink
end
end
and this is how I use it in my tests
create_tempfile(:content => config_json) do |temp_filename|
expect(File.exists?(temp_filename)).to eq true
expect(File.read(temp_filename)).to eq config_json
expect(File.exists?(@results[:fetched_docker_client_config_json])).to eq true
...
end
I suspect if we use the replace tempfile creation in base.rb
with the code above and then use the block notation it will also work? The only downside is that we would have to find all the tempfile references and change those to use the block notation instead. IMHO, I've had really good luck with this implementation in other languages like python and I feel it helps ensure code readability a little bit because the context manager can clean up after itself even when a exception occurs which isn't something we get with the current implementation of tempfile.
I'm curious if you have cycles to try this for me? I'm still struggling getting a windows box up and running that actually works. As soon as I open up a bash terminal on windows using 'git bash' and I type a command like vagrant status
it takes like 15 seconds to come back. It's crazy, it's an Intel I3 with 8G ram and a fusion SSHD drive so it shouldn't be that slow. Ugh, windows! ;)
@codylane - Oh yeah, vagrant
commands taking a while in Windows is pretty normal from what I've experienced. Not sure if it's because it's doing Powershell behind the scenes or what :-/
As for the solution to this...I propose something simpler and maybe more consistent with what is already being done. Can we just change @docker_systemd_config
to be a the actual Tempfile
reference instead of the path as a string? The only other thing to change is referencing .path
in the upload method instead of just the variable.
This would be consistent with how tempfile()
is being used in this same file, like here on line 95 for the client config file: https://github.com/tmatilai/vagrant-proxyconf/blob/7bcdb8471ddf15b2859cdfb33b5571c74eae3e55/lib/vagrant-proxyconf/action/configure_docker_proxy.rb#L95
I'd even argue that the variable should probably just be named @docker_client_config
.
Also, even if something bad happens, the files should still be cleaned up when the process exits I think.
Does that seem OK, or do you have grander refactoring plans you'd rather do?
No problem, I like it and I agree you solution is much simpler, it's also the original implementation. I changed it, so all of this is my fault. I'll get it fixed today and release 2.0.8 without the docker proxy refactor which will be pushed to 2.0.9.
@codylane - Hey now, I'm just glad you're maintaining this! Thanks for all of your help! :)
You bet my friend. I'm glad to help. I made a promise that I'm going to do my part volunteering where I can help and hopefully continue to maintain code that people use and like to use that I've contributed too. I don't even use this plugin anymore but I'm happy to continue to help maintain it.
As always, we cannot do it without without the assume open source community and people like you. We would love to continue having your help with this project and I'm sure others will appreciate it as well.
I'd love to stay in touch and if you'd like to help join this project I'm more than happy to grant you some perms. We just need the blessing of the owner @tmatilai
@codylane - Cool - I'm definitely open to helping you all maintain when needed, whatever you all deem appropriate I'm good with!
@chucknelson - Okay, so I've got the code staged but I've got some failing tests and I'm going to step away from the computer for the evening. I was really hoping to have this out the door today but I really don't want to cause any more problems for folks and I want to confirm that I'm not just getting to anxious to fix things. I'll be on this first thing tomorrow morning MDT.
You bet my friend. I'm glad to help. I made a promise that I'm going to do my part volunteering where I can help and hopefully continue to maintain code that people use and like to use that I've contributed too. I don't even use this plugin anymore but I'm happy to continue to help maintain it.
Thank you so much for your work! You are amazing and very kind!
As always, we cannot do it without without the assume open source community and people like you. We would love to continue having your help with this project and I'm sure others will appreciate it as well.
👍
I'd love to stay in touch and if you'd like to help join this project I'm more than happy to grant you some perms. We just need the blessing of the owner @tmatilai
I send invitation for @chucknelson to join the group of collaborators. Welcome onboard!
I myself have found very little time and energy for OSS work lately, and me neither am using Vagrant for this plugin, so I don't even follow the issues too closely. Especially the Windows and Docker bits have been contributed by other people.
So if I don't seem to respond to even pings here, you can always send me email directly. 😊
Released 2.0.8. closing request. Thanks again for your help.
Problem Statement:
Technical Review Synopsis: