tenstorrent / tt-kmd

Tenstorrent Kernel Module
GNU General Public License v2.0
28 stars 6 forks source link

Previous versions of tt-kmd may still be the one used if not explicitly removed on Ubuntu 20.04 via DKMS #1

Open tt-rkim opened 7 months ago

tt-rkim commented 7 months ago

We have a few cases of bare metal Ubuntu 20.04 boxes with a previous version of tt-kmd installed which don't pick up the newest version when added to dkms even after a module reload/reboot.

Example output:

tt-admin@e08cs08:~$ sudo dkms status tenstorrent
tenstorrent, 1.21, 5.4.0-166-generic, x86_64: built
tenstorrent, 1.21, 5.4.0-167-generic, x86_64: installed
tenstorrent, 1.26, 5.4.0-166-generic, x86_64: installed

Note that we've seen this on systems with 1.20.1 and 1.23, so seems to be version-independent.

In tt-smi, we see that the driver in use is not the most recent one. 1.26 is our desired version.

One way to deal with this from the user side is to remove all dkms modules before adding the newest one. However, this is cumbersome with the dkms command line interface requiring specific versions listed for removal. sed and awk are friends here but we personally would prefer to sidestep that and have a nicer install experience.

alewycky-tenstorrent commented 2 months ago

If we install a package from "dkms mkdeb", it will invoke /usr/lib/dkms/common.postinst which will additionally build for the newest installed kernel or all kernels if autoinstall_all_kernels="y" is set in /etc/dkms/framework.conf. Building for newest installed kernel still fails in unlikely corner-cases (install multiple new kernels, boot one that isn't the newest), but it's close enough.

The other approach that I recommend is to run "dkms autoinstall" on boot. Here are the instructions I worked up: sudo systemctl edit --force --full dkms-autoinstall.service

[Unit]
Description=Recompile DKMS modules for running kernel
DefaultDependencies=no
Before=systemd-udev-trigger.service

[Service]
Type=oneshot
ExecStart=/usr/sbin/dkms autoinstall

[Install]
WantedBy=systemd-udev-trigger.service

sudo systemctl enable dkms-autoinstall.service