rvalles / pyamigadebug

Framework for abstracting Amiga debuggers and access to AmigaOS libraries and devices. AmigaXfer lives here.
MIT License
45 stars 2 forks source link

CRC errors for large(-ish) files? #2

Closed JensRestemeier closed 1 year ago

JensRestemeier commented 1 year ago

I managed to get this to run on MacOS, and I managed to transfer some smaller files to an Amiga 1200 without problems. Larger files run into a CRC exception sooner or later. Is that something you're seeing on Windows as well? (i.e. do I have to look for the problem on the MacOS or the Amiga side?) Should it retry a block after a CRC error, or does that quit the upload process?

rvalles commented 1 year ago

Larger files run into a CRC exception sooner or later. Is that something you're seeing on Windows as well?

Not on Linux, nor Windows.

Please describe your serial port setup, such as usb-rs232? usb-ttl (what chip) + max232-ish (what level converter)? length of cables/wires, anything else interesting.

(i.e. do I have to look for the problem on the MacOS or the Amiga side?)

With my systems (including A1200 with 030 accelboard enabled and disabled), from Amiga perspective, I can do ~800kbps up, ~500kbps down, perfectly reliable; No CRC errors, ever.

To test the CRC-ing, I actually need to inject faults, and transferring large (as in multi-megabyte) files is part of the tests I do with a range of setups and kickstarts before releases, without issues.

File transfers are backed by a transfer+crc routine I have used to backup and restore hard disk partitions (at block level) whole. This is functionality that is supported in the library, but not exposed in the GUI.

I do not have a MacOS system; I'm glad to hear it works at all there, as I had no confirmation of this.

Please try a slower rate on the serial; a step down might suffice but please particularly try both 9600 and 19200.

There's reason behind testing these two speeds; 9600 is the speed used in romwack/sad use, meaning that no speed changes happen during transmission. 19200 is the next step, where speed changes yet the fast speed is still very slow.

To summarize, please describe your serial setup and see if it happens at 9600 and 19200bps.

JensRestemeier commented 1 year ago

Ok, I'll have a look if I can diagnose the problem. So far I tried 9600, 19200 and 512k. Maybe it is a MacOS-specific setting on the serial port, or the USB-to-serial adapter I'm using. Cable goes USB->serial 9 pin -> null modem converter 9 pin -> 9 pin to 25 pin. USB is using a CH340 chip, IIRC. The cable is 70cm, I don't know if the chip sits closer to the USB connector or the serial connector.

rvalles commented 1 year ago

It is using a CH340 chip, IIRC

Glad you've mentioned this. I have a range of chips here, and there's one specifically which seems to be super problematic (while every other chip is fine). It is the CH340 chip.

As far as I can tell in my experience, this chip has low tolerance for clock differences when receiving (I suspect it doesn't even supersample), so it drops characters every now and again, as the Amiga timings are generated from dividing the pixel clock and thus are always off somewhat.

I recall I found some specific speed (38400? 57600?) at which ch340 worked reliably, whereas one step lower or one step higher did not.

If you have the luxury to do so, I'd recommend switching to anything else for testing, just to discard ch340.

It'd also be interesting to confirm that you've got the same issues on a windows and/or linux machine with that same ch340 chip.

Cable goes USB->serial 9 pin -> null modem converter 9 pin -> 9 pin to 25 pin. USB is using a CH340 chip, IIRC. The cable is 70cm,

What I have attached most of the time is some cp2102 with a 20cm wire to a ttl-rs232 dongle (max232) connected to a 9<>25 adapter in turn connected to the target Amiga.

No issues either with FT232RL (800/500) nor FT232H (I've successfully gone above 1mbps with this), nor a FTDI "US232R-10-BULK" USB serial directly into the 9<>25 (800/500), nor a pl2303 usb-serial (800/500) into 9<>25, nor a 3m null-modem cable to an old PC's serial port 16750 (115200).

JensRestemeier commented 1 year ago

I'll order a better cable to try. (I guess cheap cables aren't worth it, I had to discard a USB->RS422 adapter recently for classic Mac and NABU, that was using a pl2303 clone chip.)

rvalles commented 1 year ago

I'll order a better cable to try.

That FTDI "US232R-10-BULK" from official FTDI store is definitely a solid one.

If you want cheap-yet-fancy, then AliExpress has very cheap cp2104 based and fancy FT232H (and 2323 and other variants) based usb-ttl. For the ttl-to-rs232 part, search for "max3232".

You might also be able to get a usb-rs232 based on something not ch340, but that's harder and imho less useful than the usb-to-ttl + ttl-to-rs232 solution.

JensRestemeier commented 1 year ago

I got an FTDI based cable, and I get the same result on my Mac and my Raspberry Pi... for now I managed to transfer the required files by catching the exception returned by self.snip.verifiedwritemem and retrying it until it succeeds. I am now running some diagnostics on the Amiga in case it is a memory failure at that end...

rvalles commented 1 year ago

OK, we can discard the usb-serial chip as the cause at this point.

Testing the RAM is never a bad idea, but now I'm racking my brain on possibilities. DB25<>DB9 being at fault is unlikely, and so is rust in the a1200 connector as it'd be much worse. I have seen that in an A600, which was fixed by scratching the pins with a female dupont.

I'm curious about the A1200. Is it expanded in any way? What kickstart are you running? Do you know the board revision?

JensRestemeier commented 1 year ago

Ok, memory tests went fine.

I do have an 68030 accelerator with 128MB RAM by Individual Computers, as a test I removed it.

At 512k the transfer hung after ~1MB (of a 3MB file), at 19200 it completed successfully.

rvalles commented 1 year ago

https://gist.github.com/rvalles/20b485abb07ba3af4ae6be1046d93bfb

Please give this a try. Set correct device and remove the comment for sad vs romwack and appropriate speeds from snips init.

Note that these hardcoded values are for PAL Amiga; if your Amiga is NTSC, check SetupDialog.py for a table with adequate values.

I am leaving this running overnight on my own A1200 with blizzard 1230mkIV (030&882 @ 50, 128M FAST), with the usual cp2102 then max232 setup, at serper 6 both ways.

I ran it from the debug bootblock (most convenient as I currently got no display attached to the Amiga).

Should it blow up (I expect it will for you), you can see if 19200 really works reliably then play around with the serial speed until you find the best one that is reliable (I hope something fast works).

On my hardware, I know serper 4 read and serper 6 write are the fastest reliable values for all the dongles/cables and 2xA500/1xA1200 from extensive testing in the past.

JensRestemeier commented 1 year ago

I just changed the minimum settings (usb device and SAD debugger) and put the accelerator back. The first run stopped at count 5, the second stopped at count 15. I am using the bootblock method, though will the accelerator be initialised like that? I'll try a few more different speeds.

With the accelerator removed the first test ran to count 30, the second to run 4.

Do you know if the CPU actually hangs in these situations, or could it be recovered with a timeout?

Edit: With settings
baudrate=115200, readmemserper=29, readmembaudrate=115200, writememserper=29, writemembaudrate=115200

It just ran up to ->A count: 305 diff: 6.4608474940177985s <-A count: 305 diff: 6.30940908001503s I'll see if I can run with the accelerator enabled later tonight.

rvalles commented 1 year ago

I got this far, and stopped it just now as I am rebooting my laptop for a new kernel.

->A count: 8056 diff: 2.6750902489875443s <-A count: 8056 diff: 2.6291827850509435s

It ran long enough (11h+) to say it's probably perfectly reliable, at serper 6 on the described a1200.

I do not know what factors could come into play to make your end unreliable, so I can only guess.

I doubt there's worse and better paulas. I heard some A1200 have timing issues, but only as applied to accelerator boards. UART resides in the paula, and its timing is based on dividing pixel clock (on that, are you PAL or NTSC?), and that's something I understand as solid on Amiga.

Physical interface (i.e. rust in the pins, resistance somewhere in the chain) is my favourite hypothesis at this point. Perhaps unplug and replug both ends of the 25<>9 adapter several times in a row.

The one time I saw that (A600 I got in a different country), it was bad enough that even 9600 was unreliable, and became perfectly reliable after removing it as described earlier.

But that was https://github.com/rvalles/amiga_uartrecv rather than amigaXfer. Long time ago.

rvalles commented 1 year ago

I am using the bootblock method, though will the accelerator be initialised like that?

Yes, the way I understand accelerators work, they grab hold of the 68k bus on power on and don't release it unless they have a feature to do so (holding numeric row 2 while rebooting, in most blizzard cards).

I also use the bootblock, because that's the easiest way and takes significantly less time than booting from hdd (got a reverse TCP: shell set up)

JensRestemeier commented 1 year ago

Ok, I'll need to investigate more - the hardware is more than 20 years old now, and I don't know how it was handled before I got it. It is a UK model. Interestingly it has a very bad picture on my larger Sony Bravia TV, even on RGB, so who knows if there is some rot in the timing circuit?

rvalles commented 1 year ago

If you're using the original A1200 PSU, it's known to be one of the most anemic variants, particularly affecting expanded A1200 (hard disks and accelerators).

The only variant I'd use today is the second A500 one, which provides the most power. I have two of those, and two modern supplies (a picopsu + atx-to-amiga adapter and a c64psu.com supply).

A1200 were made in bad cap era. It might need a recap. Bad caps would aggravate voltage drop issues from anemic power supply.

You might not be in the wrong track with this.

JensRestemeier commented 1 year ago

Ok, I'm closing this for now, the key question was if the problems were expected on the host or the client side. Looks like the CH340 cable was a big problem either way, so replacing this with a FTDI cable helped with stability. I'll see if re-capping the amiga helps, once I've got time to set up my workbench again.

rvalles commented 1 year ago

Thank you.

I am hopeful recap will fix your RGB issue. It will help me to hear whether it fixes the unreliable serial issue, either way.