Closed tantra35 closed 4 years ago
That's an interesting bug - thanks for reporting it. We'll need to investigate the cause here. Does it happen reliably or only in a subset of the tasks? What about other drivers?
The "text file busy" error indicates that go-getter is still expanding tarball and hasn't closed the file descriptor before filuent-bit is invoked. This matches your hypothesis. However, in my simple scenarios, I haven't been able to reproduce it yet with 0.9.3 or latest master.
@notnoop First time we began observe this when launch "spark on nomad" where we launch huge amount of jobs and time to time we can see this (fluent-bit is just log collector satellite). Then we change driver to exec
where we doesn't observe this behavior, but I explain this by the fact that the exec requires the creation of a chroot environment and this delay in creation is enough to completely unpack the file and error doesn't happens
But what i described here happened in our production cluster on one of autoscale jobs, before on nomad 0.8.6
we not observed this
Hey there
Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.
Thanks!
Hey there
Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.
Thanks!
This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem :+1:
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
0.9.3
When we use raw_exec driver we some time see failed allocations with follow error in them:
We think that this happened due follow definition of task
Nomad at sometimes doesn't wait a full extract of artifact, and try to launch it