Open WorldBuild opened 3 months ago
Hi 👋 thanks for reaching out.
For general questions we recommend reaching out to the [community forum](https://discuss.hashicorp.com/c/packer) for greater visibility.
As the GitHub issue tracker is only watched by a small subset of maintainers and is really reserved for bugs and enhancements, you'll have a better chance of finding someone who can help you in the forum.
We'll mark this issue as needs-reply to help inform maintainers that this question is awaiting a response.
If no activity is taken on this question within 30 days it will be automatically closed.
If you find the forum to be more helpful or if you've found the answer to your question elsewhere please feel free to post a response and close the issue.
Hi @WorldBuild,
Thanks for bubbling this up! This looks like a bug indeed, no clue if that's on us or if it's an upstream problem (yamux is the muxer we use over the wire for allowing multiple entities to communicate over one single connection), but it warrants an investigation.
Judging from the logs it seems this occurs rapidly during the course of a build, right?
Looking at your logs it seems you're running Packer 1.9.4, any chance you can try on a more recent version to see if the problem persists? I'd be surprised if it did tbh as we didn't change that piece of code for a while, but maybe something else positively interacts with it?
If you update, please note that you'll need to manually install the qemu plugin so Packer can use it, you can do something as simple as packer plugins install github.com/hashicorp/qemu
for this. Feel free to read our documentation on this subject too if you want more information.
Hi @WorldBuild,
Thanks for bubbling this up! This looks like a bug indeed, no clue if that's on us or if it's an upstream problem (yamux is the muxer we use over the wire for allowing multiple entities to communicate over one single connection), but it warrants an investigation.
Looking at your logs it seems you're running Packer 1.9.4, any chance you can try on a more recent version to see if the problem persists? I'd be surprised if it did tbh as we didn't change that piece of code for a while, but maybe something else positively interacts with it? If you update, please note that you'll need to manually install the qemu plugin so Packer can use it, you can do something as simple as
packer plugins install github.com/hashicorp/qemu
for this. Feel free to read our documentation on this subject too if you want more information.
Hey @lbajolet-hashicorp! Thank you for the detailed response and help! I'll try upgrading to a newer version and test it out for a few weeks, and reportback my results.
Judging from the logs it seems this occurs rapidly during the course of a build, right?
Yep I can confirm that, it happens near instantly, I remember when researching this 2 weeks back I saw a discussion that keepalive can be configured to have a longer timeout or disabled entirely but I couldn't find the source again :/
For future reference, is there any way I can pull out more detailed logs when crashes such as this happen? So far I've been relying on PACKER_LOG but it didn't help me much in this case since I honestly still have no idea what actually crashed 😅 (if anything)
EDIT: cat walked on my keyboard
For the verbose logs unfortunately I'm not sure what more we can do :/
The error you see comes straight from yamux
, so unless it can give us more context on the keepalive/deadline problem, not sure we have much more to report here.
It warrants some delving in its code base though, which will probably have to happen to understand how we end-up here, but since that code is a bit complex, and stuff occurs on different coroutines, it might be a bit hard to troubleshoot unfortunately.
Out of curiosity, do you hit this often?
As for the WinRM
hypothesis, it could be I imagine? Though that seems unlikely, as yamux only manages communication with the plugin, not with the VM directly (from my understanding at least), but it could be a side-effect possibly.
Will investigate!
For the verbose logs unfortunately I'm not sure what more we can do :/ The error you see comes straight from
yamux
, so unless it can give us more context on the keepalive/deadline problem, not sure we have much more to report here.It warrants some delving in its code base though, which will probably have to happen to understand how we end-up here, but since that code is a bit complex, and stuff occurs on different coroutines, it might be a bit hard to troubleshoot unfortunately.
Out of curiosity, do you hit this often?
As for the
WinRM
hypothesis, it could be I imagine? Though that seems unlikely, as yamux only manages communication with the plugin, not with the VM directly (from my understanding at least), but it could be a side-effect possibly.Will investigate!
Thanks for investigating, to answer your question, no I don't hit this very often, so it's not a critical issue
Digging around in the Yamux codebase, the error seems to be thrown from this code block, after Yamux times out https://github.com/hashicorp/yamux/blob/d1caa6c97c9fc1cc9e83bbe34d0603f9ff0ce8bd/session.go#L303-L309
could bumping the ConnectionWriteTimeout here help? https://github.com/hashicorp/yamux/blob/d1caa6c97c9fc1cc9e83bbe34d0603f9ff0ce8bd/mux.go#L16-L27
Ideally if we could configure this in the Packer template or via an environmental variable (like PACKER_LOG), I could test it out for you :)
Hi @WorldBuild,
Thanks for the delve into the yamux code, much appreciated! Bumping the timeout might be a fix for your case yes, I'm not sure by how much though, and for problems like this it's unfortunately not a one-value-fits-all kind of situation I'm afraid, still it's worth looking into.
If you're able to experiment with it and try out what would feel good to you please do, and if you want to add an option to the configurations in the SDK please do and open a PR if that might be helpful to others :)
Thanks!
Hello, I use Packer often to build qcow2 images from other qcow2 images, and once in a blue moon, at a random time during the build, Packer exits abruptly due to something related to Yamux
I run my builds on CentOS 8, and this happens even when I have no provisioners, so I think the most likely culprit is the communicator I use (WinRM)
Simplified template
Log doesn't help much as it just shows the error message with yamux keepalive and qemu EOF