hashicorp / packer-plugin-amazon

Packer plugin for Amazon AMI Builder
https://www.packer.io/docs/builders/amazon
Mozilla Public License 2.0
69 stars 104 forks source link

packer execution time has increased by two/four times after updating the version from 1.1.4 to any higher version #449

Closed varunm17 closed 5 months ago

varunm17 commented 5 months ago

Overview of the Issue

We are customising many different OS as per our requirements. We were using the version 1.1.4 and we plan to use the latest version of the plugin (1.2.9), however after updating the plugin version we have observed that the build time has almost doubled in case of linux AMI's and for windows the build time has increased by almost 4 times.

Reproduction Steps

Operating System \ amazon packer plugin version 1.1.4 1.1.5 1.1.6 1.2.0 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.2.6 1.2.7 1.2.8 1.2.9
Amazon Linux 2 14 minutes, 4 seconds 35 minutes, 2 seconds 34 minutes, 22 seconds 35 minutes, 3 seconds 36 minutes, 30 seconds 33 minutes, 31 seconds 36 minutes, 9 seconds 34 minutes, 31 seconds 34 minutes, 43 seconds 34 minutes, 58 seconds 37 minutes, 47 seconds 35 minutes, 32 seconds 35 minutes, 47 seconds
Ubuntu 20 19 minutes, 16 seconds 46 minutes, 12 seconds 46 minutes, 58 seconds 44 minutes, 25 seconds 43 minutes, 43 seconds 43 minutes, 50 seconds 44 minutes, 43 seconds 44 minutes, 43 seconds 44 minutes, 43 seconds 50 minutes, 15 seconds 46 minutes, 11 seconds 43 minutes, 14 seconds 43 minutes, 41 seconds
Ubuntu 22 20 minutes, 44 seconds 42 minutes, 44 seconds 45 minutes, 5 seconds 45 minutes, 39 seconds 43 minutes, 1 seconds 42 minutes, 43 seconds 41 minutes, 40 seconds 42 minutes, 52 seconds 42 minutes, 34 seconds 43 minutes, 14 seconds 42 minutes, 26 seconds 42 minutes, 56 seconds 43 minutes, 26 seconds
Windows 2019 22 minutes, 22 seconds 1 hours, 22 minutes, 35 seconds 1 hours, 22 minutes, 54 seconds 1 hours, 24 minutes, 6 seconds 1 hours, 42 minutes, 14 seconds 1 hours, 25 minutes, 36 seconds 1 hours, 18 minutes, 47 seconds 1 hours, 22 minutes, 25 seconds 1 hours, 21 minutes, 53 seconds 1 hours, 40 minutes, 21 seconds 1 hours, 19 minutes, 19 seconds 1 hours, 23 minutes, 43 seconds 1 hours, 26 minutes, 1 seconds

Plugin and Packer version

Packer version: 1.10.0 [go1.20.11 linux amd64] Ansibe packer plugin version : 1.1.1

Operating system and Environment details

Operating System:

Log Fragments and crash.log files

For sharing the logs we have executed only few ansible playbooks, hence the execution time is on the lower end but still we can observe the execution time after upgrading the plugin has increased significantly

Operating System \ amazon packer plugin version 1.1.4 - log fragment gist URL 1.1.4 - execution time 1.2.9 - log fragment 1.2.9- execution time
Amazon Linux 2 amazon_linux_2_plugin_version_1.1.4.log 3 minutes, 17 seconds amazon_linux_2_plugin_version_1.2.9.log 6 minutes, 55 seconds
Ubuntu 20 ubuntu_20_plugin_version_1.1.4.log 3 minutes, 6 seconds ubuntu_20_plugin_version_1.2.9.log 5 minutes, 51 seconds
Ubuntu 22 ubuntu_22_plugin_version_1.1.4.log 2 minutes, 26 seconds ubuntu_22_plugin_version_1.2.9.log 4 minutes, 38 seconds
Windows Server 2019 windows_2019_plugin_version_1.1.4.log 3 minutes, 17 seconds and 3 minutes, 3 seconds windows_2019_plugin_version_1.2.9.log 9 minutes, 3 seconds and 14 minutes, 27 seconds
lbajolet-hashicorp commented 5 months ago

Hi @varunm17,

Thanks for the report, with the current information however it'll be hard to figure out where the performance problem appears, so I'll request help from you to figure out the source of the problem if you can.

Could you share your configurations please? I've taken a quick look at the logs from one of your builds (AL2), looks like the script is slower for some reason (~4 minutes vs. 7 minutes), the plugin's version is the same, and the OS image looks similar as well from a quick look, so it could be a SSH-related problem potentially, but it's hard to say with just that information.

The script outputs something else in the 1.1.4 run however:

    main.amazon-ebs.xyz_ami: skipping: package-cleanup -q --leaves | xargs -l1 yum -y remove
    main.amazon-ebs.xyz_ami: due to package dep locale bug

Is there an error that causes this? I don't have access to the script that you're running, so it's hard to say, but if it exits prematurely, that might be why 1.14 is faster, no clue why that'd be though since the rest of the environment looks similar.

Is the host the same machine in both tests?

Also, may I ask your help in determining which version introduces the slowdown? Could you bisect all the versions from 1.1.4 -> 1.2.9, so we have a better idea where to look?

Thanks.

varunm17 commented 5 months ago

hi @lbajolet-hashicorp,

Thanks you for your response. You can ignore the script output, from this issue we are specifically observing the slowness during execution of ansible playbooks. We do not see any errors just the execution time has increased as reported. You are correct we have everything in common in the environment except the packer amazon plugin versions.

The host is not the same machine. Each build spins up a fresh instance. But we are able to replicate this issue at our will.

The packer amazon plugin version where the execution time started to increase is 1.1.5.

Hope this helps.

lbajolet-hashicorp commented 5 months ago

Thanks for the follow-up.

Looking at the diff between 1.1.4 and 1.1.5, this should not change much as we only introduced a new option (enable_unlimited_credits) with this version (refer to PR #273 for context). We also started building with go 1.18 instead of 1.17, but this should not impact performance too.

By any chance, did you specify an instance type to run the builds on? Are you using spot instances? I'm wondering if the instance type picked changed between those two versions, which may explain why scripts are suddenly slower to execute on the instance. Could you check that on your side and report to us? We'll see if that's something we can explain/fix/workaround.

varunm17 commented 5 months ago
[defaults]
bin_ansible_callbacks = True
callbacks_enabled = timer, profile_roles, profile_tasks
lbajolet-hashicorp commented 5 months ago

@varunm17 one thing I don't know from the template you shared is the instance type, you're not using spot instances I'd think (unless I missed the option) so that might be indeed due to the PR we merged, which always sets the credit specification on T2/T3/T3a/T4g instances, and if unspecified in the config, this will be false, i.e. standard. Prior to this version, if unset on T3/T3a/T4g instances, the default credit spec would be unlimited by default, which could be why your builds slowed down.

Could you try to enable unlimited credits on your configs? Adding enable_unlimited_credits = true to the source's configuration should do the trick.

varunm17 commented 5 months ago

thanks @lbajolet-hashicorp for your inputs, indeed we were using t3 instance, hence after Adding enable_unlimited_credits = true the execution time is back to normal, really appreciate your analysis and solution