Closed uweseimet closed 8 months ago
@jcres69 Can you please add detailed information on your setup, e.g. Pi OS version, Pi type, board version and computer type? In case you have more than one SCSI cable, do you have the same problem also with a different cable? How long is your cable? Is it a proper SCSI cable or a cable originally meant to be used for a different purpose, e.g. a centronics cable? Also ensure that your board is terminated.
Please also add logging information on trace level, but try to restrict the log to about 30 lines before the warning. Please only attach a hard drive for any further testing, but no other devices.
I'll gather the traces and add below.
By the way, I believe the mac plus pre-dates SCSI-2, so it may not be 100% compatible FYI.
@jcres69 Please remember to check that your board is properly terminated. The strange thing about this issue is that the SCSI specification defines a minimum time to wait before assuming that the bus signals are stable. Waiting longer is fine, waiting not that long can cause issues. This is why I cannot make this time shorter. But your system only appears to work when this time is less than the mandatory time ...
I verified the rascsi board is terminated... here is the trace file with two warning messages. only one drive was attached, nothing else.
[2024-02-22 15:51:34.705] [debug] (ID 6) - Controller is executing READ(10), CDB $28:00:00:18:10:ab:00:00:01:00 [2024-02-22 15:51:34.705] [debug] (ID:LUN 6:0) - Device is executing READ(10) ($28) [2024-02-22 15:51:34.705] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 1577131, sector count: 1 [2024-02-22 15:51:34.705] [trace] (ID 6) - DATA IN phase [2024-02-22 15:51:34.707] [trace] (ID 6) - All data transferred [2024-02-22 15:51:34.707] [trace] (ID 6) - Status phase, status is GOOD (status code $00) [2024-02-22 15:51:34.708] [trace] (ID 6) - All data transferred [2024-02-22 15:51:34.708] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:51:34.708] [trace] (ID 6) - All data transferred [2024-02-22 15:51:34.708] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:51:39.867] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:51:39.867] [trace] (ID 6) - SELECTION phase [2024-02-22 15:51:39.867] [trace] (ID 6) - COMMAND phase [2024-02-22 15:51:39.868] [debug] (ID 6) - Controller is executing READ(10), CDB $28:00:00:00:10:e1:00:00:01:00 [2024-02-22 15:51:39.868] [debug] (ID:LUN 6:0) - Device is executing READ(10) ($28) [2024-02-22 15:51:39.868] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 4321, sector count: 1 [2024-02-22 15:51:39.868] [trace] (ID 6) - DATA IN phase [2024-02-22 15:51:39.871] [trace] (ID 6) - All data transferred [2024-02-22 15:51:39.871] [trace] (ID 6) - Status phase, status is GOOD (status code $00) [2024-02-22 15:51:39.872] [trace] (ID 6) - All data transferred [2024-02-22 15:51:39.872] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:51:39.872] [trace] (ID 6) - All data transferred [2024-02-22 15:51:39.872] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:51:42.478] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:51:42.478] [trace] (ID 6) - SELECTION phase [2024-02-22 15:51:42.478] [trace] (ID 6) - COMMAND phase [2024-02-22 15:51:42.479] [debug] (ID 6) - Controller is executing WRITE(10), CDB $2a:00:00:00:09:f4:00:00:01:00 [2024-02-22 15:51:42.479] [debug] (ID:LUN 6:0) - Device is executing WRITE(10) ($2a) [2024-02-22 15:51:42.479] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 2548, sector count: 1 [2024-02-22 15:51:42.479] [trace] (ID 6) - DATA OUT phase [2024-02-22 15:51:45.479] [warning] (ID 6) - Received 6 byte(s) in DATA OUT phase, command requires 512 [2024-02-22 15:51:45.479] [debug] (ID 6) - ABORTED COMMAND (Sense Key $0b), DATA PHASE ERROR (ASC $4b), ASCQ $00 [2024-02-22 15:51:45.479] [trace] (ID 6) - Status phase, status is CHECK CONDITION (status code $02) [2024-02-22 15:51:45.479] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.479] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:51:45.479] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.479] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:51:45.480] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:51:45.480] [trace] (ID 6) - SELECTION phase [2024-02-22 15:51:45.480] [trace] (ID 6) - COMMAND phase [2024-02-22 15:51:45.481] [debug] (ID 6) - Controller is executing REQUEST SENSE, CDB $03:00:00:00:10:00 [2024-02-22 15:51:45.481] [debug] (ID:LUN 6:0) - Device is executing REQUEST SENSE ($03) [2024-02-22 15:51:45.481] [trace] (ID:LUN 6:0) - Status $00, Sense Key $0b, ASC $4b [2024-02-22 15:51:45.481] [trace] (ID 6) - DATA IN phase [2024-02-22 15:51:45.481] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.481] [trace] (ID 6) - Status phase, status is GOOD (status code $00) [2024-02-22 15:51:45.481] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.481] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:51:45.481] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.481] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:51:45.482] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:51:45.482] [trace] (ID 6) - SELECTION phase [2024-02-22 15:51:45.483] [trace] (ID 6) - COMMAND phase [2024-02-22 15:51:45.483] [debug] (ID 6) - Controller is executing WRITE(10), CDB $2a:00:00:00:09:f4:00:00:01:00 [2024-02-22 15:51:45.483] [debug] (ID:LUN 6:0) - Device is executing WRITE(10) ($2a) [2024-02-22 15:51:45.483] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 2548, sector count: 1 [2024-02-22 15:51:45.483] [trace] (ID 6) - DATA OUT phase [2024-02-22 15:51:45.486] [trace] (ID 6) - Status phase, status is GOOD (status code $00) [2024-02-22 15:51:45.486] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.486] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:51:45.486] [trace] (ID 6) - All data transferred [2024-02-22 15:51:45.486] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:52:02.420] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:52:02.420] [trace] (ID 6) - SELECTION phase [2024-02-22 15:52:02.420] [trace] (ID 6) - COMMAND phase [2024-02-22 15:52:02.421] [debug] (ID 6) - Controller is executing WRITE(10), CDB $2a:00:00:00:00:62:00:00:01:00 [2024-02-22 15:52:02.421] [debug] (ID:LUN 6:0) - Device is executing WRITE(10) ($2a) [2024-02-22 15:52:02.421] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 98, sector count: 1 [2024-02-22 15:52:02.421] [trace] (ID 6) - DATA OUT phase [2024-02-22 15:52:02.423] [trace] (ID 6) - Status phase, status is GOOD (status code $00) [2024-02-22 15:52:02.424] [trace] (ID 6) - All data transferred [2024-02-22 15:52:02.424] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:52:02.424] [trace] (ID 6) - All data transferred [2024-02-22 15:52:02.424] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:52:02.427] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:52:02.427] [trace] (ID 6) - SELECTION phase [2024-02-22 15:52:02.427] [trace] (ID 6) - COMMAND phase [2024-02-22 15:52:02.428] [debug] (ID 6) - Controller is executing WRITE(10), CDB $2a:00:00:00:0a:08:00:00:01:00 [2024-02-22 15:52:02.428] [debug] (ID:LUN 6:0) - Device is executing WRITE(10) ($2a) [2024-02-22 15:52:02.428] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 2568, sector count: 1 [2024-02-22 15:52:02.428] [trace] (ID 6) - DATA OUT phase [2024-02-22 15:52:02.431] [trace] (ID 6) - Status phase, status is GOOD (status code $00) [2024-02-22 15:52:02.431] [trace] (ID 6) - All data transferred [2024-02-22 15:52:02.431] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:52:02.431] [trace] (ID 6) - All data transferred [2024-02-22 15:52:02.431] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:52:02.433] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:52:02.433] [trace] (ID 6) - SELECTION phase [2024-02-22 15:52:02.433] [trace] (ID 6) - COMMAND phase [2024-02-22 15:52:02.433] [debug] (ID 6) - Controller is executing WRITE(10), CDB $2a:00:00:00:0e:6c:00:00:01:00 [2024-02-22 15:52:02.433] [debug] (ID:LUN 6:0) - Device is executing WRITE(10) ($2a) [2024-02-22 15:52:02.433] [trace] (ID:LUN 6:0) - READ/WRITE/VERIFY/SEEK, start sector: 3692, sector count: 1 [2024-02-22 15:52:02.434] [trace] (ID 6) - DATA OUT phase [2024-02-22 15:52:05.434] [warning] (ID 6) - Received 8 byte(s) in DATA OUT phase, command requires 512 [2024-02-22 15:52:05.434] [debug] (ID 6) - ABORTED COMMAND (Sense Key $0b), DATA PHASE ERROR (ASC $4b), ASCQ $00 [2024-02-22 15:52:05.434] [trace] (ID 6) - Status phase, status is CHECK CONDITION (status code $02) [2024-02-22 15:52:05.434] [trace] (ID 6) - All data transferred [2024-02-22 15:52:05.434] [trace] (ID 6) - MESSAGE IN phase [2024-02-22 15:52:05.434] [trace] (ID 6) - All data transferred [2024-02-22 15:52:05.434] [trace] (ID 6) - BUS FREE phase [2024-02-22 15:52:05.435] [trace] (ID 6) - ++++ Starting processing for initiator ID 7 [2024-02-22 15:52:05.435] [trace] (ID 6) - SELECTION phase [2024-02-22 15:52:05.435] [trace] (ID 6) - COMMAND phase [2024-02-22 15:52:05.436] [debug] (ID 6) - Controller is executing REQUEST SENSE, CDB $03:00:00:00:10:00 [2024-02-22 15:52:05.436] [debug] (ID:LUN 6:0) - Device is executing REQUEST SENSE ($03) [2024-02-22 15:52:05.436] [trace] (ID:LUN 6:0) - Status $00, Sense Key $0b, ASC $4b
@jcres69 Thank you for the log. It confirms what I have already guessed: A write operation is sometimes interrupted in the middle of sending data to the board. A signal line that should remain unchanged during the transfer has changed. Neither piscsi nor s2p actively change this line during a transfer. This is an outgoing line, and thus I think that the Mac can also not change it. I don't see any other explanation than a hardware issue, but this would not explain why you do not seem to have issues when there is no bus settle delay. Are you prepared to compile the SCSI2Pi sources yourself? See https://github.com/uweseimet/scsi2pi/wiki/Compilation-Instructions. I would like to create a branch for double-checking that removing the delay is a solution for you.
@uweseimet Sure I can do that, happy to help!
@jcres69 There is an "issue_71" branch now. This branch is identical with the current develop branch, but I removed the delay that I re-added for release 2.0.1. Can you please compile this branch and check whether there is a change? It should essentially behave like release 2.0.
Are you running any other software on your Pi that makes use of the Pi's timers or the GPIO port? This would probably be software that is not part of the regular Pi OS distribution.
@uweseimet There is no other software installed like that, that I know of. the only software I installed other than what came with the distribution was scsi2pi. I'll do some testing on that branch and get back with you.
@uweseimet - After testing with this version I confirmed the problem is resolved. I do not see any scsi errors...
@jcres69 Thank you for testing. This means, that for you SCSI2Pi (and very likely also PiSCSI) only appear to work with a non-standard timing. Unfortunately I cannot make this timing the default because it violates the SCSI standard (SCSI2Pi reads data from the bus too fast), and because it is less reliable. This timing is only relevant when the Pi receives data, not when it sends data. Are you 100% sure that when your Mac writes data (i.e. the Pi receives them) to an image file the data in the image file are correct? Maybe you have some means for testing that. You might, for instance, copy a lot of data from one image file to another and then check whether the result is fine.
Even though I cannot change SCSi2Pi to always use this timing, if you are sure that this timing works for you, you can quite easily enforce this timing by modifying the SCSI2Pi sources before compiling them. In order to do that, in cpp/buses/rpi_bus.cpp just comment out or remove these lines at the end of the file:
if (const uint32_t diff = timer_core_freq * 400 / 1000; diff) {
const uint32_t start = armt_addr[ARMT_FREERUN];
while (armt_addr[ARMT_FREERUN] - start < diff) {
// Intentionally empty
}
}
This change, which I also applied for the branch you were testing, removes the so-called SCSI bus settle delay. Maybe you want to give this a try after checking out SCSI2Pi 3.0, which was released today, e.g. by running "git pull main" and then "make clean" and "make". If you do that I would appreciate your feedback on the result. You can also check out the current SCSI2Pi 3.1 develop branch with "git pull develop" instead, which may give you a 3% increase in performance.
You might, for instance, copy a lot of data from one image file to another and then check whether the result is fine.
This is exactly what I did many times and the data is good after copying. When troubleshooting this with the piscsi software, I was told it is a hardware problem, so I spent a lot of time troubleshooting the Mac plus hardware, three logic boards, ram, scsi cables, nothing resolved it other than the s2p software. I had stopped using the piscsi because these scsi errors made it unusable as it would cause the Mac to hang. I also remember there were other people besides me who had been using the piscsi that had the same problems with the Mac Plus.
I can do more testing as you suggested, editing the source file. But I'm thinking this is a specific issue with the Plus...
Was it always the Mac Plus that was affected? Anyway, I don't see anything more I can do. With your help it was possible to collect more details on these timing issues, even though there does not seem to be a SCSI-compliant solution for it. But if you don't have any issues when omitting the bus settle delay there is at least some kind of solution for you, which is better than nothing ;-). Maybe other users with the same problem can confirm that this work-around also works for them. If it's OK for you I would like to close this ticket, because right now there is nothing more that can be investigated. In case there is something new in the future, this ticket can be re-opened.
Yes, I think this is a good solution for me. Thanks!
@uweseimet Just FYI, I did as you suggested and deleted those lines from cpp/buses/rpi_bus.cpp after git pull from main, compiled version 3.0 and it works good, but got these errors when compiling. not sure if this is a problem just wanted to know if I did something wrong:
/usr/include/c++/12/bits/char_traits.h:431:56: warning: ‘void __builtin_memcpy(void, const void, long unsigned int)’ accessing 9223372036854775810 or more bytes at offsets [2, 9223372036854775807] and 1 may overlap up to 9223372036854775813 bytes at offset -3 [-Wrestrict]
431 | return static_cast<char_type>(__builtin_memcpy(s1, s2, __n));
| ~~~~^~~~~
/usr/include/c++/12/bits/char_traits.h:431:56: warning: ‘void __builtin_memcpy(void, const void, long unsigned int)’ accessing 9223372036854775810 or more bytes at offsets -4611686018427387902 and [-4611686018427387903, 4611686018427387904] may overlap up to 9223372036854775813 bytes at offset -3 [-Wrestrict]
431 | return static_cast<char_type>(__builtin_memcpy(s1, s2, __n));
| ~~~~^~~~~
make[1]: Leaving directory '/home/pi/scsi2pi/cpp' make -C doc make[1]: Entering directory '/home/pi/scsi2pi/doc' make[1]: Nothing to be done for 'all'. make[1]: Leaving directory '/home/pi/scsi2pi/doc'
@jcres69 You can ignore these messages. They are related to some compiler-internal code, not the the actual SCSI2Pi code. As far as I can tell only the GNU compiler reports this, when you use clang++ you don't get this warning. clang++ compiles faster and needs less memory. To use it run
make CXX=clang++
You may have to install clang++ on your Pi (with apt install) first.
@jcres69 Please take notice of https://github.com/uweseimet/scsi2pi/issues/82. Maybe you can check out the issue_82 branch and verify whether in your case it is also the ACK signal that is not managed correctly. When running s2p from this branch on trace level, the lines right before the DATA OUT phase warning will reveal that.
@uweseimet I don't see the issue_82 branch - I did read #82 and agree about the silverlining drivers. However, even with silverlining, the piscsi software still gives me the scsi errors. only with the "fixed" version of s2p is it completely resolved for me.
@jcres69 I forgot to mention that I deleted the branch because all changes are now contained in the develop branch. Can you please use the develop branch and add a trace log of the error message you get (please only the relevant part of the log, i.e. just a few lines) to this ticket. I am asking for this because I would like to ensure that we collect all available information, for potential future reference.
@uweseimet I got this error after attempting to boot the first time using the develop branch, but have not been able to recreate this again... the Mac would not boot at first, but now there doesn't seem to be a problem.
[2024-03-05 15:56:06.001] [debug] (ID 5) - Controller is executing READ(6)/GET MESSAGE(6), CDB $08:00:00:00:01:00 [2024-03-05 15:56:06.001] [debug] (ID:LUN 5:0) - Device is executing READ(6)/GET MESSAGE(6) ($08) [2024-03-05 15:56:06.001] [trace] (ID:LUN 5:0) - READ/WRITE/VERIFY/SEEK, start sector: 0, sector count: 1 [2024-03-05 15:56:06.006] [trace] (ID 5) - DATA IN phase [2024-03-05 15:56:06.006] [trace] (ID 5) - Sending 512 byte(s) [2024-03-05 15:56:06.041] [trace] (ID 5) - All data transferred [2024-03-05 15:56:06.041] [trace] (ID 5) - Status phase, status is GOOD (status code $00) [2024-03-05 15:56:06.041] [trace] (ID 5) - Sending 1 byte(s) [2024-03-05 15:56:06.041] [trace] (ID 5) - All data transferred [2024-03-05 15:56:06.041] [trace] (ID 5) - MESSAGE IN phase [2024-03-05 15:56:06.041] [trace] (ID 5) - Sending 1 byte(s) [2024-03-05 15:56:06.041] [trace] (ID 5) - All data transferred [2024-03-05 15:56:06.041] [trace] (ID 5) - BUS FREE phase [2024-03-05 15:56:06.045] [trace] (ID 5) - ++++ Starting processing for initiator ID 7 [2024-03-05 15:56:06.045] [trace] (ID 5) - SELECTION phase [2024-03-05 15:56:06.045] [trace] (ID 5) - COMMAND phase [2024-03-05 15:56:06.045] [debug] (ID 5) - Controller is executing READ(6)/GET MESSAGE(6), CDB $08:00:00:40:0b:00 [2024-03-05 15:56:06.045] [debug] (ID:LUN 5:0) - Device is executing READ(6)/GET MESSAGE(6) ($08) [2024-03-05 15:56:06.045] [trace] (ID:LUN 5:0) - READ/WRITE/VERIFY/SEEK, start sector: 64, sector count: 11 [2024-03-05 15:56:06.045] [trace] (ID 5) - DATA IN phase [2024-03-05 15:56:06.045] [trace] (ID 5) - Sending 512 byte(s) [2024-03-05 15:56:06.049] [trace] (ID 5) - Sending 512 byte(s) [2024-03-05 15:56:06.053] [trace] (ID 5) - Sending 512 byte(s) [2024-03-05 15:56:09.103] [trace] Timeout while waiting for ACK/REQ to change to true [2024-03-05 15:56:09.103] [warning] (ID 5) - Sent 371 byte(s) in DATA IN phase, command requires 512 [2024-03-05 15:56:09.103] [debug] (ID 5) - ABORTED COMMAND (Sense Key $0b), DATA PHASE ERROR (ASC $4b), ASCQ $00 [2024-03-05 15:56:09.103] [trace] (ID 5) - Status phase, status is CHECK CONDITION (status code $02) [2024-03-05 15:56:09.103] [trace] (ID 5) - Sending 1 byte(s) [2024-03-05 15:56:12.103] [trace] Timeout while waiting for ACK/REQ to change to true [2024-03-05 15:56:12.103] [trace] (ID 5) - BUS FREE phase
[2024-03-05 15:56:09.103] [trace] Timeout while waiting for ACK/REQ to change to true
[2024-03-05 15:56:09.103] [warning] (ID 5) - Sent 371 byte(s) in DATA IN phase, command requires 512
This confirms that the error is related to the REQ/ACK handshake, but that it happens in a different part of the handshake as in https://github.com/uweseimet/scsi2pi/issues/82. Thank you for running this test.
This is a follow-up ticket to https://github.com/PiSCSI/piscsi/issues/656 and https://github.com/uweseimet/scsi2pi/issues/20. It has turned out that for @jcres69 piscsi/s2p only seem to work with a non-standard SCSI timing. SCSI2Pi 2.0 is working for him, SCSI2Pi 2.0.1 is not. There are messages like
(With piscsi v23.11.01 the problem is the same, just the message is different.) The messages indicate that the board and the computer "lose" the connection in the DATA OUT phase, i.e. not all required data for the respective SCSI command are received because the bus phase appears to change before the end of the data transfer. Device 5 is a DaynaPort, device 6 a hard drive.
Completely removing the bus settle delay appears to resolve the problem but is not SCSI compliant and causes issues (wrong data received in the DATA OUT phase) with other platforms. These issues are not necessarily immediately visible.