uweseimet / scsi2pi

Advanced performant SCSI/SASI emulation and tools for the PiSCSI/RaSCSI board
https://www.scsi2pi.net
BSD 3-Clause "New" or "Revised" License
10 stars 2 forks source link

Investigate handling of bus settle delay #71

Closed uweseimet closed 8 months ago

uweseimet commented 9 months ago

This is a follow-up ticket to https://github.com/PiSCSI/piscsi/issues/656 and https://github.com/uweseimet/scsi2pi/issues/20. It has turned out that for @jcres69 piscsi/s2p only seem to work with a non-standard SCSI timing. SCSI2Pi 2.0 is working for him, SCSI2Pi 2.0.1 is not. There are messages like

[2024-02-22 10:52:53.818] [warning] (ID 5) - Received 16 byte(s) in DATA OUT phase, command requires 62
[2024-02-22 10:53:32.070] [warning] (ID 6) - Received 359 byte(s) in DATA OUT phase, command requires 512

(With piscsi v23.11.01 the problem is the same, just the message is different.) The messages indicate that the board and the computer "lose" the connection in the DATA OUT phase, i.e. not all required data for the respective SCSI command are received because the bus phase appears to change before the end of the data transfer. Device 5 is a DaynaPort, device 6 a hard drive.

Completely removing the bus settle delay appears to resolve the problem but is not SCSI compliant and causes issues (wrong data received in the DATA OUT phase) with other platforms. These issues are not necessarily immediately visible.

uweseimet commented 9 months ago

@jcres69 Can you please add detailed information on your setup, e.g. Pi OS version, Pi type, board version and computer type? In case you have more than one SCSI cable, do you have the same problem also with a different cable? How long is your cable? Is it a proper SCSI cable or a cable originally meant to be used for a different purpose, e.g. a centronics cable? Also ensure that your board is terminated.

Please also add logging information on trace level, but try to restrict the log to about 30 lines before the warning. Please only attach a hard drive for any further testing, but no other devices.

jcres69 commented 9 months ago

I'll gather the traces and add below.

pi@raspberrypi

OS: Debian GNU/Linux 12 (bookworm) aarch64 Host: Raspberry Pi 4 Model B Rev 1.5 Kernel: 6.1.0-rpi8-rpi-v8 CPU: (4) @ 1.800GHz Memory: 99MiB / 1848MiB

Rascsi 68kmla version 2.4a

Computer is Mac Plus 4Mb, I've tested on system 608 and 701 with same results. I have a gender changer that basically connects directly to the scsi port, and also a short cable, same results on both.

By the way, I believe the mac plus pre-dates SCSI-2, so it may not be 100% compatible FYI.

uweseimet commented 9 months ago

@jcres69 Please remember to check that your board is properly terminated. The strange thing about this issue is that the SCSI specification defines a minimum time to wait before assuming that the bus signals are stable. Waiting longer is fine, waiting not that long can cause issues. This is why I cannot make this time shorter. But your system only appears to work when this time is less than the mandatory time ...

jcres69 commented 9 months ago

I verified the rascsi board is terminated... here is the trace file with two warning messages. only one drive was attached, nothing else.

[2024-02-22 15:51:34.705] [debug] (ID 6) - Controller is executing READ(10), CDB $28:00:00:18:10:ab:00:00:01:00 [2024-02-22 15:51:34.705] [debug] (ID:LUN 6:0) - Device is executing READ(10) ($28) [2024-02-22 15:51:34.705] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 1577131, sector count: 1 [2024-02-22 15:51:34.705] [trace] (ID 6) - DATA IN phase [2024-02-22 15:51:34.707] [trace] (ID 6) - All data transferred [2024-02-22 15:51:34.707] [trace] (ID 6) - Status phase, status is GOOD (status code $00) [2024-02-22 15:51:34.708] [trace] (ID 6) - All data transferred [2024-02-22 15:51:34.708] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:51:34.708] [trace] (ID 6) - All data transferred [2024-02-22 15:51:34.708] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:51:39.867] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:51:39.867] [trace] (ID 6) - SELECTION phase [2024-02-22 15:51:39.867] [trace] (ID 6) - COMMAND phase [2024-02-22 15:51:39.868] [debug] (ID 6) - Controller is executing READ(10), CDB $28:00:00:00:10:e1:00:00:01:00 [2024-02-22 15:51:39.868] [debug] (ID:LUN 6:0) - Device is executing READ(10) ($28) [2024-02-22 15:51:39.868] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 4321, sector count: 1 [2024-02-22 15:51:39.868] [trace] (ID 6) - DATA IN phase [2024-02-22 15:51:39.871] [trace] (ID 6) - All data transferred [2024-02-22 15:51:39.871] [trace] (ID 6) - Status phase, status is GOOD (status code $00) [2024-02-22 15:51:39.872] [trace] (ID 6) - All data transferred [2024-02-22 15:51:39.872] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:51:39.872] [trace] (ID 6) - All data transferred [2024-02-22 15:51:39.872] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:51:42.478] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:51:42.478] [trace] (ID 6) - SELECTION phase [2024-02-22 15:51:42.478] [trace] (ID 6) - COMMAND phase [2024-02-22 15:51:42.479] [debug] (ID 6) - Controller is executing WRITE(10), CDB $2a:00:00:00:09:f4:00:00:01:00 [2024-02-22 15:51:42.479] [debug] (ID:LUN 6:0) - Device is executing WRITE(10) ($2a) [2024-02-22 15:51:42.479] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 2548, sector count: 1 [2024-02-22 15:51:42.479] [trace] (ID 6) - DATA OUT phase [2024-02-22 15:51:45.479] [warning] (ID 6) - Received 6 byte(s) in DATA OUT phase, command requires 512 [2024-02-22 15:51:45.479] [debug] (ID 6) - ABORTED COMMAND (Sense Key $0b), DATA PHASE ERROR (ASC $4b), ASCQ $00 [2024-02-22 15:51:45.479] [trace] (ID 6) - Status phase, status is CHECK CONDITION (status code $02) [2024-02-22 15:51:45.479] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.479] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:51:45.479] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.479] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:51:45.480] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:51:45.480] [trace] (ID 6) - SELECTION phase [2024-02-22 15:51:45.480] [trace] (ID 6) - COMMAND phase [2024-02-22 15:51:45.481] [debug] (ID 6) - Controller is executing REQUEST SENSE, CDB $03:00:00:00:10:00 [2024-02-22 15:51:45.481] [debug] (ID:LUN 6:0) - Device is executing REQUEST SENSE ($03) [2024-02-22 15:51:45.481] [trace] (ID:LUN 6:0) - Status $00, Sense Key $0b, ASC $4b [2024-02-22 15:51:45.481] [trace] (ID 6) - DATA IN phase [2024-02-22 15:51:45.481] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.481] [trace] (ID 6) - Status phase, status is GOOD (status code $00) [2024-02-22 15:51:45.481] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.481] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:51:45.481] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.481] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:51:45.482] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:51:45.482] [trace] (ID 6) - SELECTION phase [2024-02-22 15:51:45.483] [trace] (ID 6) - COMMAND phase [2024-02-22 15:51:45.483] [debug] (ID 6) - Controller is executing WRITE(10), CDB $2a:00:00:00:09:f4:00:00:01:00 [2024-02-22 15:51:45.483] [debug] (ID:LUN 6:0) - Device is executing WRITE(10) ($2a) [2024-02-22 15:51:45.483] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 2548, sector count: 1 [2024-02-22 15:51:45.483] [trace] (ID 6) - DATA OUT phase [2024-02-22 15:51:45.486] [trace] (ID 6) - Status phase, status is GOOD (status code $00) [2024-02-22 15:51:45.486] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.486] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:51:45.486] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.486] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:52:02.420] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:52:02.420] [trace] (ID 6) - SELECTION phase [2024-02-22 15:52:02.420] [trace] (ID 6) - COMMAND phase [2024-02-22 15:52:02.421] [debug] (ID 6) - Controller is executing WRITE(10), CDB $2a:00:00:00:00:62:00:00:01:00 [2024-02-22 15:52:02.421] [debug] (ID:LUN 6:0) - Device is executing WRITE(10) ($2a) [2024-02-22 15:52:02.421] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 98, sector count: 1 [2024-02-22 15:52:02.421] [trace] (ID 6) - DATA OUT phase [2024-02-22 15:52:02.423] [trace] (ID 6) - Status phase, status is GOOD (status code $00) [2024-02-22 15:52:02.424] [trace] (ID 6) - All data transferred [2024-02-22 15:52:02.424] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:52:02.424] [trace] (ID 6) - All data transferred [2024-02-22 15:52:02.424] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:52:02.427] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:52:02.427] [trace] (ID 6) - SELECTION phase [2024-02-22 15:52:02.427] [trace] (ID 6) - COMMAND phase [2024-02-22 15:52:02.428] [debug] (ID 6) - Controller is executing WRITE(10), CDB $2a:00:00:00:0a:08:00:00:01:00 [2024-02-22 15:52:02.428] [debug] (ID:LUN 6:0) - Device is executing WRITE(10) ($2a) [2024-02-22 15:52:02.428] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 2568, sector count: 1 [2024-02-22 15:52:02.428] [trace] (ID 6) - DATA OUT phase [2024-02-22 15:52:02.431] [trace] (ID 6) - Status phase, status is GOOD (status code $00) [2024-02-22 15:52:02.431] [trace] (ID 6) - All data transferred [2024-02-22 15:52:02.431] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:52:02.431] [trace] (ID 6) - All data transferred [2024-02-22 15:52:02.431] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:52:02.433] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:52:02.433] [trace] (ID 6) - SELECTION phase [2024-02-22 15:52:02.433] [trace] (ID 6) - COMMAND phase [2024-02-22 15:52:02.433] [debug] (ID 6) - Controller is executing WRITE(10), CDB $2a:00:00:00:0e:6c:00:00:01:00 [2024-02-22 15:52:02.433] [debug] (ID:LUN 6:0) - Device is executing WRITE(10) ($2a) [2024-02-22 15:52:02.433] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 3692, sector count: 1 [2024-02-22 15:52:02.434] [trace] (ID 6) - DATA OUT phase [2024-02-22 15:52:05.434] [warning] (ID 6) - Received 8 byte(s) in DATA OUT phase, command requires 512 [2024-02-22 15:52:05.434] [debug] (ID 6) - ABORTED COMMAND (Sense Key $0b), DATA PHASE ERROR (ASC $4b), ASCQ $00 [2024-02-22 15:52:05.434] [trace] (ID 6) - Status phase, status is CHECK CONDITION (status code $02) [2024-02-22 15:52:05.434] [trace] (ID 6) - All data transferred [2024-02-22 15:52:05.434] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:52:05.434] [trace] (ID 6) - All data transferred [2024-02-22 15:52:05.434] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:52:05.435] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:52:05.435] [trace] (ID 6) - SELECTION phase [2024-02-22 15:52:05.435] [trace] (ID 6) - COMMAND phase [2024-02-22 15:52:05.436] [debug] (ID 6) - Controller is executing REQUEST SENSE, CDB $03:00:00:00:10:00 [2024-02-22 15:52:05.436] [debug] (ID:LUN 6:0) - Device is executing REQUEST SENSE ($03) [2024-02-22 15:52:05.436] [trace] (ID:LUN 6:0) - Status $00, Sense Key $0b, ASC $4b

uweseimet commented 9 months ago

@jcres69 Thank you for the log. It confirms what I have already guessed: A write operation is sometimes interrupted in the middle of sending data to the board. A signal line that should remain unchanged during the transfer has changed. Neither piscsi nor s2p actively change this line during a transfer. This is an outgoing line, and thus I think that the Mac can also not change it. I don't see any other explanation than a hardware issue, but this would not explain why you do not seem to have issues when there is no bus settle delay. Are you prepared to compile the SCSI2Pi sources yourself? See https://github.com/uweseimet/scsi2pi/wiki/Compilation-Instructions. I would like to create a branch for double-checking that removing the delay is a solution for you.

jcres69 commented 9 months ago

@uweseimet Sure I can do that, happy to help!

uweseimet commented 9 months ago

@jcres69 There is an "issue_71" branch now. This branch is identical with the current develop branch, but I removed the delay that I re-added for release 2.0.1. Can you please compile this branch and check whether there is a change? It should essentially behave like release 2.0.

Are you running any other software on your Pi that makes use of the Pi's timers or the GPIO port? This would probably be software that is not part of the regular Pi OS distribution.

jcres69 commented 9 months ago

@uweseimet There is no other software installed like that, that I know of. the only software I installed other than what came with the distribution was scsi2pi. I'll do some testing on that branch and get back with you.

jcres69 commented 8 months ago

@uweseimet - After testing with this version I confirmed the problem is resolved. I do not see any scsi errors...

uweseimet commented 8 months ago

@jcres69 Thank you for testing. This means, that for you SCSI2Pi (and very likely also PiSCSI) only appear to work with a non-standard timing. Unfortunately I cannot make this timing the default because it violates the SCSI standard (SCSI2Pi reads data from the bus too fast), and because it is less reliable. This timing is only relevant when the Pi receives data, not when it sends data. Are you 100% sure that when your Mac writes data (i.e. the Pi receives them) to an image file the data in the image file are correct? Maybe you have some means for testing that. You might, for instance, copy a lot of data from one image file to another and then check whether the result is fine.

Even though I cannot change SCSi2Pi to always use this timing, if you are sure that this timing works for you, you can quite easily enforce this timing by modifying the SCSI2Pi sources before compiling them. In order to do that, in cpp/buses/rpi_bus.cpp just comment out or remove these lines at the end of the file:

    if (const uint32_t diff = timer_core_freq * 400 / 1000; diff) {
        const uint32_t start = armt_addr[ARMT_FREERUN];
        while (armt_addr[ARMT_FREERUN] - start < diff) {
            // Intentionally empty
        }
    }

This change, which I also applied for the branch you were testing, removes the so-called SCSI bus settle delay. Maybe you want to give this a try after checking out SCSI2Pi 3.0, which was released today, e.g. by running "git pull main" and then "make clean" and "make". If you do that I would appreciate your feedback on the result. You can also check out the current SCSI2Pi 3.1 develop branch with "git pull develop" instead, which may give you a 3% increase in performance.

jcres69 commented 8 months ago

You might, for instance, copy a lot of data from one image file to another and then check whether the result is fine.

This is exactly what I did many times and the data is good after copying. When troubleshooting this with the piscsi software, I was told it is a hardware problem, so I spent a lot of time troubleshooting the Mac plus hardware, three logic boards, ram, scsi cables, nothing resolved it other than the s2p software. I had stopped using the piscsi because these scsi errors made it unusable as it would cause the Mac to hang. I also remember there were other people besides me who had been using the piscsi that had the same problems with the Mac Plus.

I can do more testing as you suggested, editing the source file. But I'm thinking this is a specific issue with the Plus...

uweseimet commented 8 months ago

Was it always the Mac Plus that was affected? Anyway, I don't see anything more I can do. With your help it was possible to collect more details on these timing issues, even though there does not seem to be a SCSI-compliant solution for it. But if you don't have any issues when omitting the bus settle delay there is at least some kind of solution for you, which is better than nothing ;-). Maybe other users with the same problem can confirm that this work-around also works for them. If it's OK for you I would like to close this ticket, because right now there is nothing more that can be investigated. In case there is something new in the future, this ticket can be re-opened.

jcres69 commented 8 months ago

Yes, I think this is a good solution for me. Thanks!

jcres69 commented 8 months ago

@uweseimet Just FYI, I did as you suggested and deleted those lines from cpp/buses/rpi_bus.cpp after git pull from main, compiled version 3.0 and it works good, but got these errors when compiling. not sure if this is a problem just wanted to know if I did something wrong:

/usr/include/c++/12/bits/char_traits.h:431:56: warning: ‘void __builtin_memcpy(void, const void, long unsigned int)’ accessing 9223372036854775810 or more bytes at offsets [2, 9223372036854775807] and 1 may overlap up to 9223372036854775813 bytes at offset -3 [-Wrestrict] 431 | return static_cast<char_type>(__builtin_memcpy(s1, s2, __n)); | ~~~~^~~~~

/usr/include/c++/12/bits/char_traits.h:431:56: warning: ‘void __builtin_memcpy(void, const void, long unsigned int)’ accessing 9223372036854775810 or more bytes at offsets -4611686018427387902 and [-4611686018427387903, 4611686018427387904] may overlap up to 9223372036854775813 bytes at offset -3 [-Wrestrict] 431 | return static_cast<char_type>(__builtin_memcpy(s1, s2, __n)); | ~~~~^~~~~

make[1]: Leaving directory '/home/pi/scsi2pi/cpp' make -C doc make[1]: Entering directory '/home/pi/scsi2pi/doc' make[1]: Nothing to be done for 'all'. make[1]: Leaving directory '/home/pi/scsi2pi/doc'

uweseimet commented 8 months ago

@jcres69 You can ignore these messages. They are related to some compiler-internal code, not the the actual SCSI2Pi code. As far as I can tell only the GNU compiler reports this, when you use clang++ you don't get this warning. clang++ compiles faster and needs less memory. To use it run

make CXX=clang++

You may have to install clang++ on your Pi (with apt install) first.

uweseimet commented 8 months ago

@jcres69 Please take notice of https://github.com/uweseimet/scsi2pi/issues/82. Maybe you can check out the issue_82 branch and verify whether in your case it is also the ACK signal that is not managed correctly. When running s2p from this branch on trace level, the lines right before the DATA OUT phase warning will reveal that.

jcres69 commented 8 months ago

@uweseimet I don't see the issue_82 branch - I did read #82 and agree about the silverlining drivers. However, even with silverlining, the piscsi software still gives me the scsi errors. only with the "fixed" version of s2p is it completely resolved for me.

uweseimet commented 8 months ago

@jcres69 I forgot to mention that I deleted the branch because all changes are now contained in the develop branch. Can you please use the develop branch and add a trace log of the error message you get (please only the relevant part of the log, i.e. just a few lines) to this ticket. I am asking for this because I would like to ensure that we collect all available information, for potential future reference.

jcres69 commented 8 months ago

@uweseimet I got this error after attempting to boot the first time using the develop branch, but have not been able to recreate this again... the Mac would not boot at first, but now there doesn't seem to be a problem.

[2024-03-05 15:56:06.001] [debug] (ID 5) - Controller is executing READ(6)/GET MESSAGE(6), CDB $08:00:00:00:01:00 [2024-03-05 15:56:06.001] [debug] (ID:LUN 5:0) - Device is executing READ(6)/GET MESSAGE(6) ($08) [2024-03-05 15:56:06.001] [trace] (ID:LUN 5:0) - READ/WRITE/VERIFY/SEEK, start sector: 0, sector count: 1 [2024-03-05 15:56:06.006] [trace] (ID 5) - DATA IN phase [2024-03-05 15:56:06.006] [trace] (ID 5) - Sending 512 byte(s) [2024-03-05 15:56:06.041] [trace] (ID 5) - All data transferred [2024-03-05 15:56:06.041] [trace] (ID 5) - Status phase, status is GOOD (status code $00) [2024-03-05 15:56:06.041] [trace] (ID 5) - Sending 1 byte(s) [2024-03-05 15:56:06.041] [trace] (ID 5) - All data transferred [2024-03-05 15:56:06.041] [trace] (ID 5) - MESSAGE IN phase [2024-03-05 15:56:06.041] [trace] (ID 5) - Sending 1 byte(s) [2024-03-05 15:56:06.041] [trace] (ID 5) - All data transferred [2024-03-05 15:56:06.041] [trace] (ID 5) - BUS FREE phase [2024-03-05 15:56:06.045] [trace] (ID 5) - ++++ Starting processing for initiator ID 7 [2024-03-05 15:56:06.045] [trace] (ID 5) - SELECTION phase [2024-03-05 15:56:06.045] [trace] (ID 5) - COMMAND phase [2024-03-05 15:56:06.045] [debug] (ID 5) - Controller is executing READ(6)/GET MESSAGE(6), CDB $08:00:00:40:0b:00 [2024-03-05 15:56:06.045] [debug] (ID:LUN 5:0) - Device is executing READ(6)/GET MESSAGE(6) ($08) [2024-03-05 15:56:06.045] [trace] (ID:LUN 5:0) - READ/WRITE/VERIFY/SEEK, start sector: 64, sector count: 11 [2024-03-05 15:56:06.045] [trace] (ID 5) - DATA IN phase [2024-03-05 15:56:06.045] [trace] (ID 5) - Sending 512 byte(s) [2024-03-05 15:56:06.049] [trace] (ID 5) - Sending 512 byte(s) [2024-03-05 15:56:06.053] [trace] (ID 5) - Sending 512 byte(s) [2024-03-05 15:56:09.103] [trace] Timeout while waiting for ACK/REQ to change to true [2024-03-05 15:56:09.103] [warning] (ID 5) - Sent 371 byte(s) in DATA IN phase, command requires 512 [2024-03-05 15:56:09.103] [debug] (ID 5) - ABORTED COMMAND (Sense Key $0b), DATA PHASE ERROR (ASC $4b), ASCQ $00 [2024-03-05 15:56:09.103] [trace] (ID 5) - Status phase, status is CHECK CONDITION (status code $02) [2024-03-05 15:56:09.103] [trace] (ID 5) - Sending 1 byte(s) [2024-03-05 15:56:12.103] [trace] Timeout while waiting for ACK/REQ to change to true [2024-03-05 15:56:12.103] [trace] (ID 5) - BUS FREE phase

uweseimet commented 8 months ago
[2024-03-05 15:56:09.103] [trace] Timeout while waiting for ACK/REQ to change to true
[2024-03-05 15:56:09.103] [warning] (ID 5) - Sent 371 byte(s) in DATA IN phase, command requires 512

This confirms that the error is related to the REQ/ACK handshake, but that it happens in a different part of the handshake as in https://github.com/uweseimet/scsi2pi/issues/82. Thank you for running this test.