Closed missinglink closed 4 days ago
Nice, yeah, the current system here is a bit brittle and complicated.
In some Geocode Earth infra we use a simpler method, which is basically to look at the symlinks in /dev/disk/by-id
:
$ ls -lh /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root 13 Feb 8 15:28 nvme-Amazon_EC2_NVMe_Instance_Storage_AWS222D6E08AD542C2D4 -> ../../nvme1n1
lrwxrwxrwx 1 root root 13 Feb 8 15:28 nvme-Amazon_Elastic_Block_Store_vol0d00dbaf264800849 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Feb 8 15:28 nvme-Amazon_Elastic_Block_Store_vol0d00dbaf264800849-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 13 Feb 8 15:28 nvme-nvme.1d0f-4157533232324436453038414435343243324434-416d617a6f6e20454332204e564d6520496e7374616e63652053746f72616765-00000001 -> ../../nvme1n1
lrwxrwxrwx 1 root root 13 Feb 8 15:28 nvme-nvme.1d0f-766f6c3064303064626166323634383030383439-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Feb 8 15:28 nvme-nvme.1d0f-766f6c3064303064626166323634383030383439-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part1 -> ../../nvme0n1p1
That doesn't require any tools, seems to be populated instantly, and makes it quite clear which volumes are EBS, which are NVMe, etc. Maybe we simplify and use that?
I had a look at that and unfortunately it doesn't seem possible, the AWS docs say there's no guarantee that the ordinal numbers correspond to the order they were defined or anything really, the only consistent way seems to be to check the block device binary header where the requested mapping path is encoded.
This is what that command looks like on a t4g.xlarge
ARM instance:
ls -lh /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root 13 Feb 8 16:15 nvme-Amazon_Elastic_Block_Store_vol0a332d2f708ad23f8 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Feb 8 16:15 nvme-Amazon_Elastic_Block_Store_vol0a332d2f708ad23f8-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 16 Feb 8 16:15 nvme-Amazon_Elastic_Block_Store_vol0a332d2f708ad23f8-part15 -> ../../nvme0n1p15
lrwxrwxrwx 1 root root 13 Feb 8 16:15 nvme-Amazon_Elastic_Block_Store_vol0eeb851c145fc2b4d -> ../../nvme2n1
lrwxrwxrwx 1 root root 13 Feb 8 16:15 nvme-Amazon_Elastic_Block_Store_vol0fd2cdbaccc423f58 -> ../../nvme1n1
lrwxrwxrwx 1 root root 13 Feb 8 16:15 nvme-nvme.1d0f-766f6c3061333332643266373038616432336638-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Feb 8 16:15 nvme-nvme.1d0f-766f6c3061333332643266373038616432336638-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 16 Feb 8 16:15 nvme-nvme.1d0f-766f6c3061333332643266373038616432336638-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part15 -> ../../nvme0n1p15
lrwxrwxrwx 1 root root 13 Feb 8 16:15 nvme-nvme.1d0f-766f6c3065656238353163313435666332623464-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 -> ../../nvme2n1
lrwxrwxrwx 1 root root 13 Feb 8 16:15 nvme-nvme.1d0f-766f6c3066643263646261636363343233663538-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 -> ../../nvme1n1
the scripts in this repo create these symlinks, which doesn't seem to be possible from the information above:
lrwxrwxrwx 1 root root 7 Feb 8 16:15 sda1 -> nvme0n1
lrwxrwxrwx 1 root root 7 Feb 8 16:15 sdb -> nvme2n1
lrwxrwxrwx 1 root root 7 Feb 8 16:15 sdc -> nvme1n1
The script we have which selects the first available disk matching a pattern would be susceptible to error since there are multiple and there's no guarantee the correct device is selected using head -n1
I've tested this on a t4g.xlarge
running an AMI tagged dev-es7.16-arm
and after a few iterations it's working great 🎉
Before we consider merging this we should change the cURL
commands I'm using to get the scripts from github to actual files committed to this repo, for security reasons.
for reference, this is what this binary encoded header looks like (note: sdb
encoded in the first bytes)
sudo nvme id-ctrl --vendor-specific /dev/nvme2n1
NVME Identify Controller:
vid : 0x1d0f
ssvid : 0x1d0f
sn : vol0eeb851c145fc2b4d
mn : Amazon Elastic Block Store
fr : 1.0
rab : 32
ieee : a002dc
cmic : 0
mdts : 6
cntlid : 0
ver : 10000
rtd3r : 0
rtd3e : 0
oaes : 0x100
ctratt : 0
oacs : 0
acl : 4
aerl : 0
frmw : 0x3
lpa : 0
elpe : 63
npss : 0
avscc : 0x1
apsta : 0
wctemp : 343
cctemp : 0
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 0
unvmcap : 0
rpmbs : 0
edstt : 0
dsto : 0
fwug : 0
kas : 0
hctma : 0
mntmt : 0
mxtmt : 0
sanicap : 0
hmminds : 0
hmmaxd : 0
sqes : 0x66
cqes : 0x44
maxcmd : 0
nn : 1
oncs : 0
fuses : 0
fna : 0
vwc : 0
awun : 0
awupf : 0
nvscc : 0
acwu : 0
sgls : 0
subnqn :
ioccsz : 0
iorcsz : 0
icdoff : 0
ctrattr : 0
msdbd : 0
ps 0 : mp:0.01W operational enlat:1000000 exlat:1000000 rrt:0 rrl:0
rwt:0 rwl:0 idle_power:- active_power:-
vs[]:
0 1 2 3 4 5 6 7 8 9 a b c d e f
0000: 73 64 62 20 20 20 20 20 20 20 20 20 20 20 20 20 "sdb............."
0010: 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 "................"
0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
01a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
01b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
01c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
01d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
01e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
01f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
02a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
02b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
02c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
02d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
02e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
03a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
03b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
03c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
03d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
03e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
03f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
add udev rule to ensure block device symlinks exist for modern nvme EBS mappings
for a while now I've noticed intermittent startup failures on smaller elasticsearch machines, I was never able to put my finger on the issue but I suspected a race between kernel tasks and
cloud-init
.today I made some progress, mainly because upgrading to ubuntu focal seems to make it fail more consistently.
what I think is going on is that there is either a race between
cloud-init
andudev
... or that udev isn't working properly due topython2.7
not being available on modern Ubu distros.but backing up a second, what's the issue?
well, with some more modern AWS machines you request an EBS block device mapping of something like
/dev/sdb
but when you boot it's actually available as/dev/nvme2n1
or something similar 🤷♂️it's kind of odd, but I believe that this is due to the 'Nitro' system using an NVME driver for EBS volumes.
so to get around this blaring issue AWS encodes some 'vendor info' in the NVME mapping binary header which contains information about the mapping you actually requested.
there is then a udev rule (the last line below) which is responsible for detecting all this and creating a symlink:
when this symlink isn't created or isn't created YET, then things break:
the udev rule installed by default seems to be broken on modern Ubu because it runs
/sbin/ebsnvme-id
which doesn't work because it requirespython2.7
which isn't installed 😢I tried installing
python2.7
and trigger the rules and it still doesn't work so 🤷♂️that's when I found this article https://opensource.creativecommons.org/blog/entries/2020-04-03-nvmee-on-debian-on-aws/ pointing me to https://github.com/oogali/ebs-automatic-nvme-mapping