keirf / flashfloppy

Floppy drive emulator for Gotek hardware
Other
1.34k stars 193 forks source link

GOTEK random data corruption #762

Closed SezaroSteve closed 1 year ago

SezaroSteve commented 1 year ago

Hello,

I have several modern GOTEKs (max. 8 months old, newest purchased and delivered two weeks ago) and they all randomly corrupt data while writing data to a virtual floppy. I use ADFer to write and verify ADF files to floppies. Most of the time this goes fine but every now and then the verify comes up with a corrupt image. The solution is to simply repeat and most of the time, after 1 or 2 retries, the written image is ok.

Same goes for simply writing files to a "floppy". Every now and then, the data ends up corrupt. I started using a CLI tool that copies, then verifies and if needed retries until the written data is intact. This works well but should not be necessary. See the screenshot of the tool in action.

Example: I write an ADF to the Gotek on my Mac and verify that the SHA256 Checksum of the file on the USB and only Mac is the same. I then configure the ADF in the FF Manager and assign it a certain slot. It works fine on my Amiga's. I then write the exact same ADF, stored on the HDD of an Amiga, to that floppy-image (the ADF) that is mounted and overwrite the contents with the ADF using ADFer. When ADFer says that the verification failed, and I pull out the USB and stick it in my Mac and check the SHA256 checksum, is is different (obviously). When I repeat writing the ADF to the mounted floppy-image until the verification says it's ok, then pull out the USB stick and check the freshly overwritten ADF file in my Mac, the SHA256 checksum is identical thus proving the ADF was written correctly.

When googling around, this sort of thing happened in a distant past. But I run FlashFloppy 3.38 (the current version) and it still happens a lot. At least on my Amigas (1200, 500, 500Plus, 3000 and 4000T. So 5 different GOTEKS in different Amigas even though they where purchased shorty after one another (within 8 months). All are the version with the 2 LED's (power & activity) and OLED Display. It has the new Gotek Board inside them I guess. I removed the top cover on one of them and it says "GOTEKsystem SFRKC30.AT3 on the silk-screen. I see no real other markings that would identify it. The firmware is FF 3.38.

When I write ADF's or other files to the USB Stick on my PC or Mac, I never have data corruption. I use good quality USB drives, mostly 32GB and 64GB versions. No cheap sh*t and I've tried several different USB sticks, different makes and models. All fine. But all get corrupted by the GOTEKs on a regular basis (let's say 25% of the time while writing).

It happens on 5 different Amiga's over here and all five run as stable as a rock. It is totally random. Just now, I wrote the VirusZ 1.04Beta bootfloppy ADF that I downloaded to the GOTEK on my A4000T and had to retry 3 times before the verification of ADFer said it was ok. On the first 2 attempts, verification failed at different tracks (and this is how it always goes, different tracks do not verify when something went wrong).

Am I missing something? Is there something to tune in ff.cfg to fix/improve the situation?

keirf commented 1 year ago

I don't know much about ADFer. Does it verify as it goes or do back to back writes and verify at the end? The 415 based Goteks have small RAM buffers and are highly sensitive to USB write latency.

SezaroSteve commented 1 year ago

ADFer is the successor of ADFblitzer. A new feature of ADFer is the verify-after-write function. It writes the entire image first (just like ADFblitzer does) and then reads the "floppy" from start to finish. ADFblitzer and it's love-child, ADFer, are really fast. They write an entire 880KB ADF in 40 seconds.

SezaroSteve commented 1 year ago

The random corruption also happens when I write normal files to a mounted image. I have 10 ADF's that serve as "generic empty floppies". I write stuff on them, then move the USB stick to one of the other Amiga's en copy the file to that Amiga's HDD. Give me a minute to get a screenshot of a copy+verify tool i'm using to transfer files. Bear with me.

SezaroSteve commented 1 year ago

GotekCorruptRetry

SezaroSteve commented 1 year ago

Ok so this is a command-line tool that I integrated into DirectoryOPUS. It copies normal files from a harddisk to a floppy (or GOTEK ADF Image), then verifies it and when errors are found, retries until the data is good. When it does go wrong, like in the screenshot, it takes between 1 to 4 retries to get rid of all errors.

SezaroSteve commented 1 year ago

You mentioned "The 415 based Goteks have small RAM buffers and are highly sensitive to USB write latency." and this got me thinking:

Now that I think of it, my problem is not so random as I thought. It happens a lot (60% of the time maybe) when doing the following:

  1. take a huge file and using a splitter tool to split it into 870 KB files called "parts".
  2. Write the first of these "parts" to a gotek floppy with DOpus. This goes full speeds and hammers the gotek relentlessly. No pauses like what happens when many small files are copied (then there is always a small pause between files because the next file needs to be read first, then the next and so on).
  3. Write the next of these "parts" to a gotek floppy
  4. Repeat until all "parts" are stored on gotek floppies. The last time I used 24 "empty ADF's" (which I use to transfer generic files between amigas) thus wrote 24 files, 870KB in size, to these virtual floppies.
  5. On the target machine, I cycled through all 24 gotek floppies, reading them and copying the "part XX" file from all of them.
  6. Then, with all 24 files in the same folder on the HDD, use the same splitter tool to "join" the file to the original large file again. This "split / join" thing works perfectly.

While writing those 24 floppies, at least half of them went wrong (ended up corrupt) and the tool you see in the screenshot had to rewrite, several times in certain cases, to end up with perfect copies of the "part" files.

So to summarise, I believe it only happens when the gotek has to write, well, more like "stream", large continuous files to gotek floppies. I think that is the pattern. I'm beating the crap out of the gotek, relentlessly writing with no pauses, and sometimes, it zigs when it should have zagged :-)

If my theory is correct, we need to "slow down" the Gotek a little bit maybe?, in order for it to have time to "breathe"? Correct me if I speak cows-poo :-)

keirf commented 1 year ago

Yes the problem is that smart Amiga tools can stream data to the floppy drive fairly constantly when doing large copies. If a whole track is being written then there is no need to read it first. Amiga tracks are not index synced so no need to wait for index pulse. And if not verifying then no need to read after the write either. It can be write-write-write-write-... and many USB sticks even branded ones can give you occasional 200+ millisecond latencies on writes. That's the time taken to stream a whole Amiga track!

The 415 Gotek has only 32kB RAM, and only 8kB is used for buffering raw incoming data. A raw Amiga track is 12.5kB. So a whole track doesn't fit in the buffer, yet we can be stuck waiting for USB writes for longer than a track. Not good. It could be improved but the 435 Goteks are the best way: these chips have 384kB RAM and FlashFloppy buffers 64kB raw incoming data -- that's plenty!

SezaroSteve commented 1 year ago

How do I recognise a 435 Gotek? Maybe mine are already, as they are rather new (max. 8 months old). I also have one lying around which is like 4 years old but the ones I actively are all quite new.

keirf commented 1 year ago

See the section "Microcontroller Options" on this page https://github.com/keirf/flashfloppy/wiki/Gotek-Models

Probably your Goteks are the 415 model. You can also tell because RAM is displayed on banner page of the firmware (when no USB drive is inserted).

SezaroSteve commented 1 year ago

You can also tell because RAM is displayed on banner page of the firmware (when no USB drive is inserted).

Yeah they all say "FlashFloppy 3.38 32kB"

If the problem cannot be solved for 415 owners, we need to start using software that is not so fast. ADFblitzer and its successor, ADFer, are too fast for 415's. I'll need to find another "ADF writer" which, simply put, is slower while writing. Or a tool that does reads directly after writes (write, then verify, then issue the next write command etc.) instead of "stream-write everything at high speed, then go back and verify by reading the entire floppy start-to-finish.

I use Directory Opus for transferring those large, almost floppy-size, files. I don't see how I can tell DOpus to slow down though. So that remains an issue.

Alternatively, replace my 5 Goteks with 435 versions :-(

keirf commented 1 year ago

Try a copier that verifies as it goes. Xcopy for example (except it's not system friendly I suppose).

SezaroSteve commented 1 year ago

Will try that.

But, could you, maybe, make flash-floppy introduce small pauses/wait-states while ingesting data? The Amiga has no clue thus it will write as fast as it can. But if you could artificially make models lower than 435's into "a slower drive" just enough to fix the problem?

keirf commented 1 year ago

What you are asking for is basically flow control. Unfortunately that is not a feature of the floppy drive protocol. We try to achieve this in firmware by delaying read data and index pulses after a write until it is written down to USB. This works for most hosts but if the host is not index syncing its writes, and nor is it reading to verify after a write, then we have no other way to force flow control feedback.

keirf commented 1 year ago

This is what makes Amiga one of the trickiest hosts for a floppy emulator.

SezaroSteve commented 1 year ago

I see. Bummer. Well, thanks for the conversation :-)

SezaroSteve commented 1 year ago

I found an old GOTEK model STM32F105 I had lying around. This model has 64kB SRAM. Upgraded its firmware to 3.88 (to compare apples with apples) and it works perfectly. I wrote 30 880KB ADF's to that GOTEK and 30 perfect copies, zero errors. Statistically, I should have had a guaranteed 40%-ish failure rate where the post-write verification fails when re-reading the floppy. I also writing about 20 or so large files as described above. Not one single error. 100% perfect.

What I learned from this: The solution is to use older STM32F105 or the latest AT32F435 models when using them in the Amiga. The AT32F415 model is NOT recommended for use with the Amiga, as the 32kB buffer causes problems during prolonged write I/O's.

I will inform the company where I bought my 5 last GOTEK's and ask them to reconsider selling their AT32F415 based product as "Amiga floppy-emulators". They are fine for other platforms, but NOT for the Amiga.

keirf commented 1 year ago

Yes please do, especially if they are an Amiga company! The best way to see more availability of 435 Goteks is to ask for them by name and to raise awareness. Did you know they cost at most a dollar more from factory? So there's no excuse! I sell them on eBay UK by the way.

SezaroSteve commented 1 year ago

Hello Mr. Fraser,

Can you please send me a link to your eBay action for the 435 GOTEKs. Thanks,

Steven Rodenburg

From: Keir Fraser @.> Reply to: keirf/flashfloppy @.> Date: Saturday, 18 February 2023 at 17:43 To: keirf/flashfloppy @.> Cc: @." @.>, Author @.> Subject: Re: [keirf/flashfloppy] GOTEK random data corruption (Issue #762)

Yes please do, especially if they are an Amiga company! The best way to see more availability of 435 Goteks is to ask for them by name and to raise awareness. Did you know they cost at most a dollar more from factory? So there's no excuse! I sell them on eBay UK by the way.

— Reply to this email directly, view it on GitHubhttps://github.com/keirf/flashfloppy/issues/762#issuecomment-1435713862, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5PPMYVRZSSFRIGI77EWNSDWYD3YVANCNFSM6AAAAAAU6PYKQ4. You are receiving this because you authored the thread.Message ID: @.***>

keirf commented 1 year ago

My current listing https://www.ebay.co.uk/itm/125672833445

More generally I am user zeroflux on Ebay UK.

I don't ship outside UK on eBay but I can do special arrangements sometimes for international sales especially if the order is large enough to justify courier shipping. Also I have a factory contact if your order is 10+ it can be worth ordering from the China factory with FedEx express shipping. Email me for further details; my email is in my GitHub profile.