Closed seven1m closed 1 year ago
Of course this one has never happened to me. In the CI link, I can see it breaks on this test:
it "raises an Errorno::EEXIST if the file exists when create a new file with File::CREAT|File::EXCL" do
-> { @fh = File.new(@file, File::CREAT|File::EXCL) }.should raise_error(Errno::EEXIST)
end
It almost looks like it raises an error, but still assigns the faulty File
object to @fh
, then the :after
loop of the spec tries to close this faulty file descriptor and this results in a EBADF
.
Maybe someone who's seen it crash on their system could try this loop just to see if it always breaks on this spec:
while bin/natalie spec/core/file/new_spec.rb; do true; done
(I just pushed a fix to ruby-spec to change Errorno::
to Errno::
, just noticed that typo)
I've now started getting it almost every time, for whatever reason:
→ bin/natalie spec/core/file/new_spec.rb
...........Traceback (most recent call last):
11: from spec/core/file/new_spec.rb:4:in `<main>'
10: from /home/tim/pp/natalie/test/support/spec.rb:1469:in `block in block'
9: from /home/tim/pp/natalie/test/support/spec.rb:1416:in `run_specs'
8: from /home/tim/pp/natalie/test/support/spec.rb:1416:in `each'
7: from /home/tim/pp/natalie/test/support/spec.rb:1452:in `block in run_specs'
6: from /home/tim/pp/natalie/test/support/spec.rb:1452:in `reverse_each'
5: from /home/tim/pp/natalie/test/support/spec.rb:1452:in `block in block in run_specs'
4: from /home/tim/pp/natalie/test/support/spec.rb:1452:in `reverse_each'
3: from /home/tim/pp/natalie/test/support/spec.rb:1452:in `block in block in block in run_specs'
2: from /home/tim/pp/natalie/test/support/spec.rb:1452:in `call'
1: from spec/core/file/new_spec.rb:13:in `block in block in block'
spec/core/file/new_spec.rb:13:in `close': Bad file descriptor (Errno::EBADF)
If I rescue the failure and print the @fs
...
after :each do
@fh.close if @fh
rm_r @file
rescue
p(fh: @fh)
p(closed?: @fh.closed?)
p(fileno: @fh.fileno)
p(eof?: @fh.eof?)
end
...this is the result:
→ bin/natalie spec/core/file/new_spec.rb
...........
{:fh=>#<File:/home/tim/pp/natalie/rubyspec_temp/859680/11-test.txt>}
{:closed?=>false}
{:fileno=>3}
Traceback (most recent call last):
11: from spec/core/file/new_spec.rb:4:in `<main>'
10: from /home/tim/pp/natalie/test/support/spec.rb:1469:in `block in block'
9: from /home/tim/pp/natalie/test/support/spec.rb:1416:in `run_specs'
8: from /home/tim/pp/natalie/test/support/spec.rb:1416:in `each'
7: from /home/tim/pp/natalie/test/support/spec.rb:1452:in `block in run_specs'
6: from /home/tim/pp/natalie/test/support/spec.rb:1452:in `reverse_each'
5: from /home/tim/pp/natalie/test/support/spec.rb:1452:in `block in block in run_specs'
4: from /home/tim/pp/natalie/test/support/spec.rb:1452:in `reverse_each'
3: from /home/tim/pp/natalie/test/support/spec.rb:1452:in `block in block in block in run_specs'
2: from /home/tim/pp/natalie/test/support/spec.rb:1452:in `call'
1: from spec/core/file/new_spec.rb:19:in `block in block in block'
spec/core/file/new_spec.rb:19:in `eof?': Bad file descriptor (Errno::EBADF)
spec/core/file/new_spec.rb:19:in `eof?': Bad file descriptor (Errno::EBADF)
That's kind of weird, line 19 looks like this for me:
17 it "returns a new File with mode string" do
18 @fh = File.new(@file, 'w')
19 @fh.should be_kind_of(File)
20 File.should.exist?(@file)
21 end
So how does that relate to eof?
?
Update: nevermind, that's the debug info you added. I'm going to get another cup of coffee
Could you add this line as the first line of the rescue block?
puts(`lsof -p #{$$}`)
That prints all the current file descriptors, which I guess should not include 3
in the error case.
Sure:
Elsof: WARNING: can't stat() overlay file system /var/lib/docker/overlay2/4d8875f8fd8bed574a6f1cd416f1b474148131ac86e4e683169e9135b0025675/merged
Output information may be incomplete.
lsof: WARNING: can't stat() nsfs file system /run/docker/netns/054b5fd3b0cb
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
natalie20 919732 tim cwd DIR 259,2 4096 6955954 /home/tim/pp/natalie
natalie20 919732 tim rtd DIR 259,2 4096 2 /
natalie20 919732 tim txt REG 259,2 1652088 2622209 /tmp/natalie20231107-919697-4shdam
natalie20 919732 tim mem REG 259,2 5716304 264308 /usr/lib/locale/locale-archive
natalie20 919732 tim mem REG 259,2 2105184 263605 /usr/lib/x86_64-linux-gnu/libc.so.6
natalie20 919732 tim mem REG 259,2 957008 264212 /usr/lib/x86_64-linux-gnu/libm.so.6
natalie20 919732 tim mem REG 259,2 2522480 265943 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.32
natalie20 919732 tim mem REG 259,2 141872 266255 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
natalie20 919732 tim mem REG 259,2 198664 262914 /usr/lib/x86_64-linux-gnu/libcrypt.so.1.1.0
natalie20 919732 tim mem REG 259,2 1514968 6954833 /home/tim/pp/natalie/build/onigmo/lib/libonigmo.so.6.0.5
natalie20 919732 tim mem REG 259,2 20404136 6976121 /home/tim/pp/natalie/build/libnatalie_base.so
natalie20 919732 tim mem REG 259,2 232568 263234 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
natalie20 919732 tim 0u CHR 136,3 0t0 6 /dev/pts/3
natalie20 919732 tim 1u CHR 136,3 0t0 6 /dev/pts/3
natalie20 919732 tim 2u CHR 136,3 0t0 6 /dev/pts/3
natalie20 919732 tim 3r FIFO 0,14 0t0 3145468 pipe
{:fh=>#<File:/home/tim/pp/natalie/rubyspec_temp/919732/17-test.txt>}
{:fileno=>3}
If I start printing stuff, the heisenbug goes away!
I added a dbg("opened {} as fileno {}", filename, fileno);
on line 61 of FileObject::initialize and all the specs started passing:
opened /home/tim/pp/natalie/rubyspec_temp/931780/1-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/1-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/2-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/2-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/3-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/3-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/4-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/4-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/4-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/5-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/5-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/5-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/6-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/6-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/7-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/7-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/8-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/8-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/9-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/9-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/10-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/10-test.txt as fileno -1
.opened /home/tim/pp/natalie/rubyspec_temp/931780/11-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/11-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/12-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/12-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/13-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/13-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/14-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/14-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/15-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/15-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/16-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/16-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/17-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/17-test.txt as fileno 3
{:before=>3}
{:before=>3}
.opened /home/tim/pp/natalie/rubyspec_temp/931780/18-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/18-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/19-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/19-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/20-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/20-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/21-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/21-test.txt as fileno -1
opened /home/tim/pp/natalie/rubyspec_temp/931780/21-test.txt as fileno -1
.opened /home/tim/pp/natalie/rubyspec_temp/931780/22-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/22-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/23-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/24-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/25-test.txt as fileno 3
.opened /home/tim/pp/natalie/rubyspec_temp/931780/26-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/26-test.txt as fileno 3
*.opened /home/tim/pp/natalie/rubyspec_temp/931780/27-test.txt as fileno 3
opened /home/tim/pp/natalie/rubyspec_temp/931780/ as fileno 3
.
27 spec(s) passed.
1 spec(s) skipped.
OK, and one last bit of data. Just to be sure my system doesn't always use fileno 3, I ran this one-liner:
→ bin/natalie -e "f1 = File.open('README.md'); f2 = File.open('LICENSE'); f3 = File.open('CONTRIBUTING.md')"
opened README.md as fileno 3
opened LICENSE as fileno 4
opened CONTRIBUTING.md as fileno 5
Last last bit of data: I can not reproduce this in the Docker container with rake docker_bash
. The spec always passes, even if I run it in a while loop a hundred times. And the fileno
inside the Docker container is 3
also. I'm thoroughly confused!
Through some high-tech printf debugging, I've narrowed the cause down to this:
~IoObject() closing 3 () object_id = 93939119179040
~IoObject() closed 3 with result 0
IoObject::close() closing 3 object_id=93939119172896
IoObject::close() closed 3 with result -1
Basically, the ~IoObject()
destructor on some other object with fileno 3 is closing it before the IoObject::close()
can do it.
Now, why there's still an object with m_fileno = 3
around to be GC'd is my next question...
I figured it out! PR incoming...
The culprit was this:
https://github.com/natalie-lang/natalie/blob/master/spec/core/file/new_spec.rb#L75-L76
Those lines effectively create a copy of the fileno, and in combination with our destructor on IoObject
(called by the GC), the fileno was getting closed even though autoclose
was set to false.
Note to self: #1270 sounds pretty similar to this issue. I was reminded of this one because I ran into issues with TCPSocket.open(&block)
We might be able to remove the call to fsync(2)
from this code.
On my local machine about 1 out of 5 times I get this error:
It's also occurring sometimes on CI: https://github.com/natalie-lang/natalie/actions/runs/6763302781/job/18381713844