uweseimet / scsi2pi

Advanced performant SCSI/SASI emulation and tools for the PiSCSI/RaSCSI board
https://www.scsi2pi.net
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

s2pexec READ(6) fails in 2.0.1 when LBA out of range #73

Closed erichelgeson closed 6 months ago

erichelgeson commented 6 months ago

When creating a new ticket please provide information on your environment.

Directly connected to a BlueSCSI v2.

Describe the issue

When using scsi2pi_2.0_rc2_37d4652_bookworm_arm64-1.deb a READ(6) with a LBA address outside of the size of the drive commands without issues. Once I upgrade to scsi2pi_2.0.1_bookworm_arm64-1.deb my commands start to fail with:

On 2.0.1:

# Request a read from way outside drive size
$ sudo /opt/scsi2pi/bin/s2pexec -i 2:0 -c 08:0F:FF:FF:00:00 -L trace
[2024-02-24 12:47:37.685] [trace] Detected 'Raspberry Pi 4 Model B Rev 1.2'
[2024-02-24 12:47:37.687] [trace] Executing command READ(6)/GET MESSAGE(10) for device 2:0
[2024-02-24 12:47:37.687] [trace] Arbitration with initiator ID 7
[2024-02-24 12:47:37.687] [trace] Selection of target 2 with initiator ID 7
[2024-02-24 12:47:37.692] [trace] Current phase is MESSAGE OUT
[2024-02-24 12:47:37.695] [trace] Current phase is COMMAND
[2024-02-24 12:47:37.711] [trace] Current phase is DATA IN
[2024-02-24 12:47:37.712] [trace] Receiving 4096 byte(s) in DATA IN phase
[2024-02-24 12:47:37.717] [trace] Current phase is DATA IN
[2024-02-24 12:47:37.717] [error] Buffer full in DATA IN phase
Error: Can't execute command $08

# Bus is now locked and no commands will run:
$ sudo /opt/scsi2pi/bin/s2pexec -i 2:1 -c 12:00:00:00:40:00 -L trace
[2024-02-24 12:48:05.583] [trace] Detected 'Raspberry Pi 4 Model B Rev 1.2'
[2024-02-24 12:48:05.584] [trace] Executing command INQUIRY for device 2:1
[2024-02-24 12:48:05.584] [trace] Arbitration with initiator ID 7
[2024-02-24 12:48:05.585] [trace] Selection of target 2 with initiator ID 7
[2024-02-24 12:48:05.590] [trace] Current phase is MESSAGE OUT
[2024-02-24 12:48:05.593] [trace] Current phase is COMMAND
Error: Can't execute command $12

I can successfully run the same commands downgrading back to rc2_37d4652 and rebooting the Pi/BlueSCSI

$ sudo /opt/scsi2pi/bin/s2pexec -i 2:0 -c 08:0F:FF:FF:00:00 -L trace
[2024-02-24 13:01:23.820] [trace] Detected 'Raspberry Pi 4 Model B Rev 1.2'
[2024-02-24 13:01:23.822] [trace] Executing command READ(6)/GET MESSAGE(10) for device 2:0
[2024-02-24 13:01:23.822] [trace] Arbitration with initiator ID 7
[2024-02-24 13:01:23.823] [trace] Selection of target 2 with initiator ID 7
[2024-02-24 13:01:23.827] [trace] Handling MESSAGE OUT phase
[2024-02-24 13:01:23.830] [trace] Handling COMMAND phase
[2024-02-24 13:01:23.861] [trace] Handling STATUS phase
[2024-02-24 13:01:23.864] [trace] Handling MESSAGE IN phase
[2024-02-24 13:01:23.864] [warning] Device reported CHECK CONDITION (status code $02)
[2024-02-24 13:01:23.864] [trace] Executing command REQUEST SENSE for device 2:0
[2024-02-24 13:01:23.864] [trace] Arbitration with initiator ID 7
[2024-02-24 13:01:23.865] [trace] Selection of target 2 with initiator ID 7
[2024-02-24 13:01:23.872] [trace] Handling MESSAGE OUT phase
[2024-02-24 13:01:23.875] [trace] Handling COMMAND phase
[2024-02-24 13:01:23.886] [trace] Handling DATA IN phase
[2024-02-24 13:01:23.886] [trace] Received 14 byte(s) in DATA IN phase, provided size was 14 bytes
[2024-02-24 13:01:23.896] [trace] Handling STATUS phase
[2024-02-24 13:01:23.899] [trace] Handling MESSAGE IN phase
Error: ILLEGAL REQUEST (Sense Key $05), LBA OUT OF RANGE (ASC $21), ASCQ $00

Which produces the expected results of LBA OUT OF RANGE

uweseimet commented 6 months ago

@erichelgeson Can you please check whether this issue still exists in the current SCSI2Pi 3.0 RC 6? You can download the binaries on https://www.scsi2pi.net/packages/develop/.

erichelgeson commented 6 months ago

Same issue in scsi2pi_3.0_rc6_67f83ef_bookworm_arm64-1.deb -

$ sudo /opt/scsi2pi/bin/s2pexec -i 2:0 -c 08:0F:FF:FF:00:00 -L trace
[13:16:59.905] [trace] Detected Raspberry Pi type 4
[13:16:59.906] [trace] Executing command READ(6)/GET MESSAGE(10) for device 2:0
[13:16:59.906] [trace] Arbitration with initiator ID 7
[13:16:59.906] [trace] Selection of target 2 with initiator ID 7
[13:16:59.911] [trace] Current phase is MESSAGE OUT
[13:16:59.914] [trace] Current phase is COMMAND
[13:16:59.929] [trace] Current phase is DATA IN
[13:16:59.929] [trace] Receiving up to 4096 byte(s) in DATA IN phase
[13:16:59.934] [trace] Current phase is DATA IN
[13:16:59.936] [error] Buffer full in DATA IN phase
Error: Can't execute command $08
uweseimet commented 6 months ago

I'm afraid I cannot reproduce this. I get this result when running s2pexec against a drive emulated by s2p with a size of 64 KB:

>/opt/scsi2pi/bin/s2pexec -v
3.0-rc6
>/opt/scsi2pi/bin/s2pexec -i 0 -c 08:0f:ff:ff:00:00 -L trace
[20:30:05.647] [trace] Detected Raspberry Pi type 4
[20:30:05.649] [trace] Executing command READ(6)/GET MESSAGE(10) for device 0:0
[20:30:05.649] [trace] Arbitration with initiator ID 7
[20:30:05.649] [trace] Selection of target 0 with initiator ID 7
[20:30:05.650] [trace] Current phase is MESSAGE OUT
[20:30:05.650] [trace] Current phase is COMMAND
[20:30:05.650] [trace] Current phase is STATUS
[20:30:05.650] [trace] Current phase is MESSAGE IN
[20:30:05.650] [warning] Device reported CHECK CONDITION (status code $02)
[20:30:05.651] [trace] Executing command REQUEST SENSE for device 0:0
[20:30:05.651] [trace] Arbitration with initiator ID 7
[20:30:05.651] [trace] Selection of target 0 with initiator ID 7
[20:30:05.651] [trace] Current phase is MESSAGE OUT
[20:30:05.652] [trace] Current phase is COMMAND
[20:30:05.652] [trace] Current phase is DATA IN
[20:30:05.652] [trace] Receiving up to 14 byte(s) in DATA IN phase
[20:30:05.652] [trace] Current phase is STATUS
[20:30:05.652] [trace] Current phase is MESSAGE IN
Error: ILLEGAL REQUEST (Sense Key $05), LBA OUT OF RANGE (ASC $21), ASCQ $00

As far as I can tell our s2pexec setup is the same. From your output it looks as if the target does not enter the status phase but tries to send more than 4096 bytes. I might have to address the blocked bus, e.g. with a SCSI reset on the initiator side, when more bytes are going to be transferred than the buffer can hold, but let's first identify the actual problem. What happens when you increase the buffer size with the -b option?

By the way, with SCSI2Pi 3.0 there is no need for using 'sudo' anymore when launching programs contained in the SCSI2Pi 3.0 binary packages or after installing self-compiled binaries with "sudo make install".

erichelgeson commented 6 months ago

The firmware on the BlueSCSI is the same across tests (v2024.02.22+ the lun patch)- I'd assume if the Target was doing something wrong (which very well could be, I'll take a closer look) it wouldn't matter if I was using the earlier or later versions of s2pexec. Disk on ID 2 size is 1,073,741,824 bytes.

uweseimet commented 6 months ago

There were some changes between 2.0 and 2.0.1 that also affect s2pexec. There was a subtle timing issue when the initiator was receiving data. Judging from the logs the target does not report any error but starts transferring the requested 256 sectors of data, which exceeds the buffer size. This results in s2pexec stopping the transfer after the receive buffer size has been reached. The target tries to send the remaining data, which is blocking the bus. In the ideal case s2pexec would try to prevent this with a bus reset. This is something I can quite likely reproduce. On the other hand, if you are sure that the target enters STATUS phase I will check why s2pexec does not seem to notice this.

erichelgeson commented 6 months ago

It is not in STATUS - bluescsi log when i

[3855918ms] DBG -- BUS_BUSY
[3855919ms] DBG ---- SELECTION: 2
[3855922ms] DBG ---- MESSAGE_OUT
[3855924ms] DBG ------ OUT: 0x80
[3855927ms] DBG ---- COMMAND: Read6
[3855930ms] DBG ------ OUT: 0x08 0x0F 0xFF 0xFF 0x00 0x00
[3855935ms] DBG ------ Read 256x512 starting at 1048575
[3855940ms] DBG ---- DATA_IN
... 5023ms delay here, normally sub second ...
[3860963ms] DBG ---- Total IN: 65536 OUT: 0 CHECKSUM: 0
[3860965ms] DBG -- BUS_FREE

I missed your question about the buffer - that seemed to allow me to get past the error:

$ /opt/scsi2pi/bin/s2pexec -i 2:0 -c 08:0F:FF:FF:00:00 -L trace -b 999999
[15:36:35.303] [trace] Detected Raspberry Pi type 4
[15:36:35.303] [trace] Executing command READ(6)/GET MESSAGE(10) for device 2:0
[15:36:35.303] [trace] Arbitration with initiator ID 7
[15:36:35.304] [trace] Selection of target 2 with initiator ID 7
[15:36:35.308] [trace] Current phase is MESSAGE OUT
[15:36:35.311] [trace] Current phase is COMMAND
[15:36:35.326] [trace] Current phase is DATA IN
[15:36:35.326] [trace] Receiving up to 999999 byte(s) in DATA IN phase
[15:36:35.521] [trace] Current phase is STATUS
[15:36:35.524] [trace] Current phase is MESSAGE IN
[15:36:35.524] [debug] Received 131072 data byte(s)
00000000  00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00  '................'
..snip..
0001fff0  00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00  '................'
uweseimet commented 6 months ago

But LBA OUT OF RANGE is not reported, is it?

erichelgeson commented 6 months ago

Oddly no but I think I something is wrong with my test here. I'm going to reset everything and start again, but likely not today.

uweseimet commented 6 months ago

No problem. In the meantime I have improved the initiator behavior in case there is a wrong bus phase. The bus should not be blocked anymore if the DATA IN phase cannot be correctly terminated.

s2pexec -i 5 -c 12:00:00:00:40:00
00000000  05:80:02:02:33:00:00:18:50:4c:45:58:54:4f:52:20  '....3...PLEXTOR '
00000010  43:44:2d:52:4f:4d:20:50:58:2d:34:30:54:53:20:20  'CD-ROM PX-40TS  '
00000020  31:2e:31:31:30:32:2f:32:31:2f:30:30:20:30:31:3a  '1.1102/21/00 01:'
00000030  31:31:20:20:20:20:20:20                          '11      '
>s2pexec -i 5 -c 08:00:00:01:01:00 -b 1
[10:45:47.006] [error] Buffer full in DATA IN phase
Error: Can't execute command READ(6)/GET MESSAGE(10) ($08)
>s2pexec -i 5 -c 12:00:00:00:40:00
00000000  05:80:02:02:33:00:00:18:50:4c:45:58:54:4f:52:20  '....3...PLEXTOR '
00000010  43:44:2d:52:4f:4d:20:50:58:2d:34:30:54:53:20:20  'CD-ROM PX-40TS  '
00000020  31:2e:31:31:30:32:2f:32:31:2f:30:30:20:30:31:3a  '1.1102/21/00 01:'
00000030  31:31:20:20:20:20:20:20                          '11      '
>s2pexec -i 5 -c 08:00:00:01:01:00 -b 1
[10:46:00.648] [warning] Device reported CHECK CONDITION (status code $02)
Error: UNIT ATTENTION (Sense Key $06), POWER ON, RESET, OR BUS DEVICE RESET OCCURRED (ASC $29), ASCQ $00

The change is included in the latest development branch builds, e.g. https://www.scsi2pi.net/packages/develop/scsi2pi_3.1_devel_30d2336_bookworm_arm64-1.deb.

In case you want to use s2pexec for scripting tests, also see the bottom of https://github.com/uweseimet/scsi2pi/wiki/Advanced-Testing.

erichelgeson commented 6 months ago

Ok, reset everything (still testing on rc6 to not move anything). I've boiled the problem down to the buffer size allocation when the cdb is 08:00:00:00:00:00:

$ /opt/scsi2pi/bin/s2pexec -i 1:0 -c 08:00:00:00:00:00 -L trace
[09:09:25.888] [trace] Receiving up to 4096 byte(s) in DATA IN phase # <- too small: 256 blocks*512 bytes = 131,072
[09:09:25.894] [trace] Current phase is DATA IN
[09:09:25.894] [error] Buffer full in DATA IN phase

$ s2pexec -i 1:0 -c 08:00:00:00:00:00 -L trace -b 131071 # fails hard reset of BlueSCSI
[09:09:07.657] [error] Buffer full in DATA IN phase

$ s2pexec -i 1:0 -c 08:00:00:00:00:00 -L trace -b 131072 # 256*512 - succeeds

I'll try the latest dev build now to see if the bus reset helps get past the bus reset hang.

uweseimet commented 6 months ago

Note that with the next develop build (available in about an hour) s2pexec will also support an explicit bus reset with the new "--reset-bus/-r" command line option.

erichelgeson commented 6 months ago

Just tested and yes the link you gave me did reset the bus on the failed DATA_IN phase.

One note though - it did not reset the bus when the bus was already hung due to my previous RC6 command failure. Eg: run bad command on rc6 get hung bus -> upgrade to dev -> try to run commands, but was still in a bad state and wasnt reset. This is a very unlikely case to have a device hung before any commands are run, but just wanted to bring it up.

The --reset-bus is a nice addition, just in case it's needed.

uweseimet commented 6 months ago

Yes, the bus will only be reset right after a command that caused a wrong bus phase. When the bus is already blocked when lauching s2pexec, s2pexec does not unblock it. It will not access the bus until the next command. But the new option should help in this case, because you can reset the bus any time with it.

uweseimet commented 6 months ago

@erichelgeson From your perspective, is there still anything to be done as part of this ticket? As far as I can tell the original problem was caused by the buffer being too small.

erichelgeson commented 6 months ago

I suppose from s2pexec's perspective you cant determine the correct buffer size before sending the READ command - but we do know 4096 will always be wrong (unless there is a 16 byte block size drive out there) - I'm not sure if you could make a better default for the case where read with Transfer length of 00 is 512*256 as that would be the 'common' case.

That is just my 2c, whatever you decide is fine.

uweseimet commented 6 months ago

I also thought about increasing the buffer exactly to this size. For s2pexec 100 KB more of memory usage are less relevant than for s2p. I will add the required change for the next development build. Note that all changes that result from this ticket will also be included in the final 3.0 release.

erichelgeson commented 6 months ago

Makes sense, thanks for your help debugging this and I learned a few things along the way - which is always the best part.

erichelgeson commented 6 months ago

Just noticed a slight regression in behavior from the buffer size change on 3.1_devel_3b4ee02:

$ s2pexec -i 1 -c 1D:00:00:10:00:00 -L trace
[11:18:32.376] [trace] Detected Raspberry Pi type 4
[11:18:32.377] [trace] Executing command SEND DIAGNOSTIC for device 1:0
[11:18:32.377] [trace] Arbitration with initiator ID 7
[11:18:32.378] [trace] Selection of target 1 with initiator ID 7
[11:18:32.383] [trace] Current phase is MESSAGE OUT
[11:18:32.386] [trace] Current phase is COMMAND
[11:18:32.398] [trace] Current phase is DATA OUT
[11:18:32.398] [trace] Sending 131072 byte(s) in DATA OUT phase
[11:18:32.413] [error] Sent 4096 byte(s) in DATA OUT phase, expected size was 131072 byte(s)
[11:18:32.413] [error] DATA OUT phase failed
Error: Can't execute command SEND DIAGNOSTIC (1d)
$ s2pexec -i 1 -c 1D:00:00:10:00:00 -L trace -b 4096
[11:18:54.366] [trace] Detected Raspberry Pi type 4
[11:18:54.367] [trace] Executing command SEND DIAGNOSTIC for device 1:0
[11:18:54.367] [trace] Arbitration with initiator ID 7
[11:18:54.367] [trace] Selection of target 1 with initiator ID 7
[11:18:54.372] [trace] Current phase is MESSAGE OUT
[11:18:54.375] [trace] Current phase is COMMAND
[11:18:54.387] [trace] Current phase is DATA OUT
[11:18:54.387] [trace] Sending 4096 byte(s) in DATA OUT phase
[11:18:54.399] [trace] Current phase is STATUS
[11:18:54.402] [trace] Current phase is MESSAGE IN
[11:18:54.402] [debug] Received 4096 data byte(s)
00000000  00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00  '................'
..snip...
00000ff0  00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00  '................'
uweseimet commented 6 months ago

This can hardly be avoided. For DATA OUT the buffer size is also implicitly the expected size of the data to be sent. s2pexec cannot know that in the actual command you said that you only wanted to transfer 4096 bytes. Before the latest change your command just worked by accident, because the size in the command and the buffer size were identical. I could make the size for DATA OUT configurable with an additional option, but this would just make things more complicated. If you do not want to transfer 131072 bytes you need to adjust the buffer size, in this case to 4096 bytes.

But I notice that 4096 bytes were received in MESSAGE IN phase? There is no message that big. What's happening here? And where do the data you send in DATA OUT come from? Without specifying a source for these data these data they will just be some random memory contents from the allocated buffer.

erichelgeson commented 6 months ago

I agree on the buffer.


The MESSAGE IN data is due to the assumption in the scsi2sd code we use for command handling that the SEND DIAGNOSTIC and RECEIVE DIAGNOSTIC RESULTS commands are linked.

From the spec:

NOTE 92 To insure that the diagnostic command information is not destroyed by a command sent from another initiator, either the SEND DIAGNOSTIC command should be linked to the RECEIVE DIAGNOSTIC RESULTS command or the logical unit should be reserved.

So a stand alone SEND DIAGNOSTIC sends the buffer back. I'm not sure how/if we can detect if the command is chained (haven't looked into it yet).

Either way - these are the types of things I wanted to bring out with these unit tests. I'll look into it more later.

uweseimet commented 6 months ago

@erichelgeson As far as I can tell these commands should be linked. Linked commands are a general concept of SCSI. Command linking can also be used with other commands. The specification has the details. Note that s2pexec might not work for linked commands. Do you know any drivers that use linked commands? What I do know is that for the Atari there are no such drivers.

erichelgeson commented 6 months ago

Do you know any drivers that use linked commands? What I do know is that for the Atari there are no such drivers.

No idea - I don't/haven't looked at any host drivers.