keirf / flashfloppy

Floppy drive emulator for Gotek hardware
Other
1.35k stars 194 forks source link

Further Format issue on BBC 8271 with Disc Utilities #67

Closed kgl2001 closed 6 years ago

kgl2001 commented 6 years ago

BBC Micro, 8271 controller, DFS1.20, FF 0.9.14a, index-suppression = no

Following on from ticket https://github.com/keirf/FlashFloppy/issues/57, I have been able to generate a format error when running a disc test with Disc Utilities. The format part of the test starts at track 80 and works down to track 1. It failed at track 19:

20180305_210900

.hfe attached:

Test-A1.zip

PeterBBCUser commented 6 years ago

Just trying to cover all possibilities to see where the difference lies that this version works fine for me but not for kgl2001

kgl2001 commented 6 years ago

Right... New Beeb (8271, DFS1.20), new Gotek (ff_67_7), same USB drive. Initial format / verify tests with Disk Utilities - 3 tests, no errors! Moving onto the read / write tests, before moving onto other test utilities...

Edit: One point to note on my original beeb / gotek setup - I still had the USB serial logging device connected to both the gotek & Linux PC. I wonder if that was having some impact, even with the normal (non debug) firmware??? I'll check that out later, once I'm finished testing with the new setup.

keirf commented 6 years ago

No, a non-debug firmware wouldn't be affected by having the serial logger hooked up.

I guess you will test the new Gotek on the old Beeb.

kgl2001 commented 6 years ago

It was very slow, but that's one complete test of Disk Utilities, and it passed without any errors. I'll try the Watford Diagnostics now.

keirf commented 6 years ago

You're the first to say it's slow (out of a small sample set!). Though tbh I was expecting it to be somewhat slow....

PeterBBCUser commented 6 years ago

the test does take a long time but is quicker than with flashfloppy_v0.9.15a.

kgl2001 commented 6 years ago

Yes. Probably better to say that it takes a long time than to say it's slow.

I'm currently on Watford Diagnostics. The format was very fast, verify was slower, and currently on the write test, which is taking just under 2 seconds per track.

PeterBBCUser commented 6 years ago

yes I noticed that the format on the Watford diagnostics went very quickly.

PeterBBCUser commented 6 years ago

I am curious why its working on this system and not your other one, what is the difference.

keirf commented 6 years ago

Format runs from INDEX to INDEX. On a real floppy drive the next track's format has to wait for INDEX to come round again. On FlashFloppy I'm positioning the "drive head" 20ms before the INDEX, post-write, so the next track's format has to wait only 20ms rather than ~200ms. It basically nearly doubles the format speed.

However for sequential writes it will suck, hence you see 2 seconds per track. Rather than writing all sectors in one revolution (200ms), each sector write has to wait for the 'disk' to spin round from the INDEX (which is where the read head gets reset to post-write). Hence writing all 10 sectors takes approx 20ms+40ms+...180ms+200ms = 1.1s.

kgl2001 commented 6 years ago

That's the Watford Diagnostics completed without any errors too. Looking good!

@PeterBBCUser - Yes. I need to do a bit of work to try and find what's making the difference. The original Beeb actually has quite a number of hardware add ons - IntegraB, PiTubeDirect Co-Pro, DataCentre & GoSDC. It also has a full batch of ROMS installed, although most were disabled for testing. The Gotek actually had a very long ribbon cable attached to it, and I wonder if it might have been something to do with that. The second system I'm testing with has a Watford Shadow RAM board and a modern SWRAM / EEPROM card (by Kjell / Sundby) installed.

keirf commented 6 years ago

The only way this will get better for these Acorn systems is with implementation of a buffer cache, allowing dirty sectors to be held in RAM across multiple sector writes, batched up and written back to USB stick. ie. a lazier writeback strategy c.w. the current strict writeback strategy which delays read-data coming back online post-write. This will be much nicer, but all post v1.0 a few months down the line... we'll be back here bug fixing a better faster firmware with fewer hacks :)

PeterBBCUser commented 6 years ago

Ken are both your BBC the same issue ? My gotek has a 50cm cable on it, O and am also powering it from an external 5v supply, which also makes it easy to reprogram it.

keirf commented 6 years ago

@kgl2001 Floppy ribbon cable is a first thing to swap out, long or not.

PeterBBCUser commented 6 years ago

I think anything over the 50cm if getting on the long side.

kgl2001 commented 6 years ago

Cable is new and approx. 1m in length.

Original Beeb is Issue 4. New Beeb is Issue 7. Other differences the gotek in the Original Beeb is being powered from the Beeb. The Gotek in the new Beeb is being powered from an external USB power adaptor. Plenty for me to test!!!

PeterBBCUser commented 6 years ago

Indeed, but it could very well be the issue 4 that's causing the problems, be interesting to narrow that down :)

keirf commented 6 years ago

The missing flux transitions on some of Ken's test runs point at the really rather long drive cable.

PeterBBCUser commented 6 years ago

I agree the cable would be the 1st thing to eliminate as 1m is really long, and will cause delays problems especially with the new version as it runs that much faster that the previous ones.

kgl2001 commented 6 years ago

yes, yes. I'm on it!

Oh, and disc doctor format / verify passed on my second Beeb too.

Edit1: Back on my original Beeb with a shorter floppy cable and Watford Diagnostic Format / Verify has completed without error. Ongoing with read / write tests. Edit2: That's the Watford Diagnostic completed without any errors.

PeterBBCUser commented 6 years ago

O and just noticed a little bug in my program for timing random track/sector lengths, I was not resetting the time after each test was wondering why it got longer for each test lol so the average time comes out to 115sec (times for each loop where 117, 109,121, 112, 116sec) O for interest on BeebEm it comes out with an average of 86.5 sec.

kgl2001 commented 6 years ago

@keirf Out of interest, now that we've established that some of my issues were probably down to cable length, do you want me to go back and test older firmware versions?

keirf commented 6 years ago

I might make a new one which doesn't do the drive head reposition post-write...

keirf commented 6 years ago

Apart from that, we're done.

keirf commented 6 years ago

Here's one which doesn't reset the drive-head position post write. Check reliability, and observe speed of format and write tests....

ff_67_7_noreset.zip

kgl2001 commented 6 years ago

The format runs slower, but I'm also getting a format error on both systems at around track 30.

Test-ff_67_7-noreset-B1.zip

keirf commented 6 years ago

That seals that then :)

keirf commented 6 years ago

Cleaned up version committed to master (see above). Here is a build from master:

ff_67_8.zip

kgl2001 commented 6 years ago

Testing now...

Looking good!

keirf commented 6 years ago

That's good, should behave as 67_7.

Anyway, will close this and #69 now. The fixes will all be rolled into 0.9.16a

kgl2001 commented 6 years ago

@keirf - Just a quick note to make you aware that I saw a couple of format / write errors last night using ff_67_8 after I posted the above. This was on my second Beeb system. Didn't get a chance to dig into it and its very possible that something's wrong at my end. I'll do some more detailed checks this evening.

@PeterBBCUser - Does this latest build work for you?

keirf commented 6 years ago

@kgl2001 Interesting 67_7 and 67_8 really should behave the same. Depending how easy to repro on 67_8, may need to revert to 67_7 and test for long enough to convince yourself the bug really isn't there too.

kgl2001 commented 6 years ago

I've still got the index-suppression = no line in my ff.cfg file. Is that still correct for these latest builds?

keirf commented 6 years ago

Yes it is.

One thing to mention is, depending on how long you soaked 67_8 for, is that you may well be chasing FlashFloppy's "generic" bug tail. I changed up how writes work back in v0.9.7a and I don't have any good regression stress tests at the moment. Very few users do much in the way of writes. Your diagnostic tests are doubtless more severe than anything else the new code has been subjected to. I have more plans for improving the emulation data paths, and it's crucial I find a good stress test before doing so. I'm hoping something exists in open source that I can adapt and run on an old PC test rig to give high-density r/w a good shakeout. Perhaps just lots of file-system activity, file reads and writes and verifies, and some formatting, will do.

keirf commented 6 years ago

I have a branch fixing a read/write buffering issue, which may be worth a try too. This bug could have caused write corruptions:

ff_67_9.zip

PeterBBCUser commented 6 years ago

just modified my test program (random write with random no of sectors) to actually compare the data it reads back to see if its what it wrote, will give it a through test and see if any problems come up.

keirf commented 6 years ago

@PeterBBCUser Thanks worth doing, though should have been covered by the sector CRC checks. Are you testing 67_9?

PeterBBCUser commented 6 years ago

yes.

kgl2001 commented 6 years ago

ff_67_9, 2nd Beeb, DFS1.20, freshly reformatted SanDisk USB, and I got a write error at track 52 using Watford Diagnostics

Test.zip

Going to try my original Beeb again. If that also fails I'll go back to ff_67_7 and try again.

PeterBBCUser commented 6 years ago

mm it failed after a bit look at track 69 RND Write test_FailTrk69_ssd.zip The test was writing a 10 sector track at the time.

kgl2001 commented 6 years ago

I'm not getting consistent results. Back on my original Beeb with ff_67_9, Watford Utilities ran through full suite of tests without issue first time round. Then failed during format the second time I ran the test. I'll switch back to ff_67_7 to see if that works any better.

Edit1: So ff_67_7 with Watford Utilities failed during write on my original Beeb. Edit2: And now with different USB stick, and it's passed the tests again.

PeterBBCUser commented 6 years ago

I have not tried to run the diagnostics on this version maybe i should try that.

keirf commented 6 years ago

It's hard to know if things are getting better. I guess as things improve, the testing gets longer and harder.

kgl2001 commented 6 years ago

I think we are making progress, but it's taking me a lot more work to try and eliminate potential issues with the setup of my 2 test systems

Edit1: Thought I had cracked it with a different USB stick, which worked first time, but failed the second time I ran the test.

keirf commented 6 years ago

@PeterBBCUser that track 69 is concerning, the data sync mark is totally missing. Like random write data got stuffed into the HFE, or the write got latched late and missed the sync mark.

I think you should hit 67_7 with the same workload.

EDIT: And of course it smells like a generic write bug rather than something Beeb-specific or specific to index-suppression=no. I mean, we might have fixed the Beeb issues and now you guys are beating the crap out of the write path, specifically the HFE write path which very few others touch, to a far greater degree than anyone else. So that's good really and I should make my own write tests too :)

PeterBBCUser commented 6 years ago

Both diagnostics ran to completion without a problem on 67_9. Running the random write/read tests on 67_7.

keirf commented 6 years ago

@PeterBBCUser if random r/w fails on 67_7 it is worth going back to latest releases 9.15 and even 9.14. Since reads/writes were reasonably okay for you back then, interesting to see if that has regressed or you are simply stressing things more and bugs were there already.

keirf commented 6 years ago

Ugh, found a small bug in 67_9. :( So here's a fixed 67_10.....

ff_67_10.zip

EDIT: Actually the bug does not affect HFE images, so this shouldn't improve things for you. I wouldn't bother specially testing it.

kgl2001 commented 6 years ago

with original beeb and ff_67_10 - format error, track 49 :( but can't see anything particularly wrong with the .hfe

Test2.zip

Edit: Hold on. That last test was with DFS1.21. Rerunning test with DFS1.20 - All tests passed with Watford Diagnostics Edit2: Ran Watford Diagnostics again, but failed with format error track 49 again. Again, nothing obvious in the .hfe...

Test3.zip

Edit3: Ran the Disk Utilities test, and it passed. I've also ran through the Watford Diagnostics tests again without any errors.

Edit4: Back onto Beeb setup 2, with the Watford Diagnostic tests running. Format / verify phase completed successfully. Duh! Another failure during the write tests... However, I've also just run the Disk Utilities test on this 2nd Beeb and it passed without error.

Not sure what to make of this!

keirf commented 6 years ago

Well, I guess there are still issues somewhere. However it seems the issues are in 67_7 too, and that one has worked with fewer failures for everyone I believe, especially @drdpj on his Archimedes tests. And 67_10 includes another definite read/write fix.

So I will roll everything into 0.9.16a later. You can see how that works in general use. And then if you want another go at fixing the diagnostics programs we can do a fresh ticket and go back to digging in with the logic analyser, debug builds, and perhaps a broader range of tests where the host system can give more precise diagnostics (error codes etc).