jnielsendotnet / isboot

FreeBSD kernel module to support booting from iSCSI via IBFT
4 stars 7 forks source link

cam rescan doesn't complete #11

Open hsnyder opened 2 years ago

hsnyder commented 2 years ago

I'm trying to get iSCSI boot working via an offboard Mellanox card. After solving two other hurdles I came across (see the two other issues I just opened ;) ), I'm stuck with a seemingly infinite loop of Root mount waiting for: CAM messages.

As noted in one of my other issues, I've changed the subsystem of isboot to SI_SUB_ROOT_CONF-1, to make it load after the mellanox module. Could that be related to the issue I'm having?

Here's the relevant part of my boot console (I can post the entire thing if you think it will help).

Load isboot
iSCSI boot driver version 0.2.15-alpha
iBFT instance 4 (ACPI_SIG_IBFT) not found
iBFT instance 3 (ACPI_SIG_IBFT) not found
iBFT instance 2 (ACPI_SIG_IBFT) not found
iBFT instance 1 (ACPI_SIG_IBFT) not found
iBFT instance 0 (ACPI_SIG_IBFT) not found
iBFT instance 4 (IBFT_SIGNATURE) not found
iBFT instance 3 (IBFT_SIGNATURE) not found
iBFT instance 2 (IBFT_SIGNATURE) found
found iBFT via ACPI
iBFT: length=609, revision=1, checksum=0x7f
iBFT: oemid='FENSYS', oemtableid='iPXE'
iBFT: sum = 0x0
iBFT: CS: length=18, index=0, flags=0x0
iBFT: CS: initiator=80, nic0=160, target0=384, nic1=0, target1=0
iBFT: IS: length=74, index=0, flags=0x3
IS: Initiator name: iqn.2010-04.org.ipxe:00000000-0000-0000-0000-3cecef70f2f4
iBFT: NIC0: length=102, index=0, flags=0x3
NIC0: IP address: 10.0.0.2
NIC0: Prefix: 24
NIC0: Origin: 1
NIC0: VLAN: 0
NIC0: MAC address: 00:02:c9:52:77:88
NIC0: PCI Bus/Dev/Func: 6300 (63/0/0)
iBFT: TGT0: length=54, index=0, flags=0x3
TGT0: Target IP address: 10.0.0.1
TGT0: Target Port: 3260
TGT0: Target LUN: 0
TGT0: CHAP type: 0
TGT0: NIC index: 0
TGT0: Target name: [REDACTED]
Boot NIC: mlxen0
Configure IPv4 by NIC0
isboot start, thread id=186a0
kproc_start
Attempting to login to iSCSI target and scan all LUNs.
isboot kproc start, thread id=18ab2
isboot iscsi start, thread id=18ab2
main loop, thread id=18ab2
initialize session, thread id=18ab2
Initiator: iqn.2010-04.org.ipxe:00000000-0000-0000-0000-3cecef70f2f4
Target: [REDACTED]
Target IP=10.0.0.1, Port=3260, LUN=0
strdup(iqn.2010-04.org.ipxe:00000000-0000-0000-0000-3cecef70f2f4)57
strdup([REDACTED])38
strdup(10.0.0.1)8
strdup(None)4
strdup(None,CRC32C)11
strdup(None,CRC32C)11
isboot_connect
open socket
try connect...(18ab2)
connect error
connect failed
boot retry (59)
isboot_connect
open socket
try connect...(18ab2)
connect error
connect failed
boot retry (58)
<6>mlx4_en: mlxen0: Link Up
mlxen0: link state changed to UP
<6>mlx4_en: mlxen0: Link Down
mlxen0: link state changed to DOWN
<6>mlx4_en: mlxen0: Link Up
mlxen0: link state changed to UP
isboot_connect
open socket
try connect...(18ab2)
wait connect...
old so=0, new so=0xfffff82069dfdb10
isboot_do_login
login start
xmit PDU
recv PDU
update option
KEY=[TargetAlias], VAL=[REDACTED]
KEY=[TargetPortalGroupTag], VAL=[1]
KEY=[HeaderDigest], VAL=[None]
free[None,CRC32C]
strdup(None)4
KEY=[DataDigest], VAL=[None]
free[None,CRC32C]
strdup(None)4
KEY=[DefaultTime2Wait], VAL=[2]
KEY=[DefaultTime2Retain], VAL=[0]
KEY=[ErrorRecoveryLevel], VAL=[0]
KEY=[MaxConnections], VAL=[1]
KEY=[InitialR2T], VAL=[Yes]
KEY=[ImmediateData], VAL=[Yes]
KEY=[MaxBurstLength], VAL=[1048576]
KEY=[MaxOutstandingR2T], VAL=[1]
KEY=[DataPDUInOrder], VAL=[Yes]
KEY=[DataSequenceInOrder], VAL=[Yes]
KEY=[FirstBurstLength], VAL=[262144]
KEY=[MaxRecvDataSegmentLength], VAL=[262144]
rsp login
free PDU
login end
cam attach
isboot action 4
isboot action 4 done
cam attach end
cam rescan
isboot action 4
isboot action 4 done
isboot action 4
isboot action 4 done
isboot action 4
isboot action 4 done
isboot action 4
isboot action 4 done
isboot action 15
isboot action 15 done
isboot action 16
isboot action 16 done
isboot action 4
isboot action 4 done
isboot action 901
isboot scsi io
add ccb
isboot action 4
isboot action 4 done
isboot action 4
isboot action 4 done
isboot_free_mbufext
Trying to mount root from ufs:/dev/gpt/[REDACTED] [rw,noatime]...
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
jnielsendotnet commented 2 years ago

I suspect this is some incompatibility with the built-in initiator in isboot and your iSCSI target. What kind of target are you using? It would help to see a packet capture of the whole iSCSI session if that's something you can get. (If the target is a FreeBSD or Linux host then it's a pretty easy "tcpdump port 3260". If not you might need help from your switch (or insert your own bridge host).

Do other parts of the boot loop or does the last line (Root mount waiting for: CAM) just repeat indefinitely?

hsnyder commented 2 years ago

Hey. The target is on a TrueNAS Core server (FreeBSD 13). I used tcpdump -i mlxen0 port 3260 as per your recommendation and have attached the result. This is the entire boot process, including iPXE.

Do other parts of the boot loop or does the last line (Root mount waiting for: CAM) just repeat indefinitely?

It's specifically the "Root mount waiting for: CAM" message that repeats.

Thanks very much for your help.

tcpdump.txt

jnielsendotnet commented 1 year ago

If you're still interested in troubleshooting this I should be able to give it some attention shortly. Will you try the latest isboot code? If you are still having this issue then send me a (binary) packet capture using something like "tcpdump -nnp -i mlxen0 -w /path/to/file.pcap port 3260". Unfortunately the text output you sent previously doesn't contain everything I would need. (Apologies for the incomplete instructions last time as well.)

jnielsendotnet commented 1 year ago

I am also seeing this about 1/3 of the time on my test box with FreeBSD-14-ALPHA4 and isboot 0.2.15. I'll try to do a packet capture/analysis of when it works vs when it does not.

jnielsendotnet commented 1 year ago

This appears to be a race condition as 1) it doesn't happen all the time (on my test machine at least) and 2) increasing the verbosity of either isboot or CAM makes it happen much more frequently

hsnyder commented 1 year ago

Thanks for looking into this! Unfortunately, I don't know if I can be of additional help as I'm no longer using the setup where I encountered this. I'll leave the issue open, but you can close it if you'd like to.

jnielsendotnet commented 1 year ago

No worries. I'm actually able to reproduce the issue so I'll keep looking in to it. Feel free to mute updates if they're no longer relevant for you. :)