keirf / flashfloppy

Floppy drive emulator for Gotek hardware
Other
1.35k stars 194 forks source link

Further Format issue on BBC 8271 with Disc Utilities #67

Closed kgl2001 closed 6 years ago

kgl2001 commented 6 years ago

BBC Micro, 8271 controller, DFS1.20, FF 0.9.14a, index-suppression = no

Following on from ticket https://github.com/keirf/FlashFloppy/issues/57, I have been able to generate a format error when running a disc test with Disc Utilities. The format part of the test starts at track 80 and works down to track 1. It failed at track 19:

20180305_210900

.hfe attached:

Test-A1.zip

keirf commented 6 years ago

By the way it may again be useful to use debug builds, but they'll obviously need to be faster than 115200 baud. I have no problem using 3Mbaud on my PL2303 USB-serial dongles, connecting to Ubuntu Linux host. I can't see why it wouldn't work on RPi too. I use miniterm.py which I think is in python-serial, it can't be too hard to log to file too.

kgl2001 commented 6 years ago

.hfe Image now attached to the original post. I ran through the Disc Utilities tests a second time and they passed on that occasion, so I'm not sure how representative the error I've posted is. I think @PeterBBCUser and @drdpj were seeing more problems than me with these utilities, so hopefully they'll be back with some more details.

I think I can get my serial dongle running at 921,600 baud on a windows system, so if you want to post a debug version at that speed, I should be able to get you some logs too. It'll take me a bit more work to get it running at speeds higher than that. I'd need to get Linux installed on one of my boxes, or one of my RPis (I'd also need to get a micro USB to USB adaptor for the RPi).

Edit - thinking about it, is it possible to connect direct to some of the GPIO pins on the RPi, removing the need for the USB / Serial dongle??? I think this is how we get debug data off the BBC PITubeDirect (via JP2 on the Kjell level shifter - https://easyeda.com/normal/PiTubeDirect_signal_converter-14c2bc8f849f48c1919784740e02a33a).

Edit 2 - I think I have Miniterm.py running on one of my CentOS boxes, but will need to test this a bit further tomorrow. Do you just run miniterm.py -b 3000000 to get max speed?

keirf commented 6 years ago

The format error is a strange one, it's definitely different than the write errors, which result in a corrupted track. Here, I can't actually see anything wrong with track 19 (presumably t19s0, as it looks like side 0 is getting formatted here). Perhaps a timeout reared its ugly head again, but t19s0 does look to have been correctly formatted despite the test failing there.

Yes you can just run miniterm like that, that's what I do.

Or I'm sure you could connect to RPi GPIOs, Gotek does RS232 protocol at 3.3V logic levels, just as the RPi likes.

kgl2001 commented 6 years ago

The disk was already formatted, so tracks 0..19 would have been part of the original blank disk that was converted from .dsd to .hfe. You're also correct about it being side 0 that was being formatted at the time I had the failure. The one thing I did notice is that the sectors for tracks 0..18 are brighter green, compared to those for 19..79 (which were formatted by the disk utility):

Tracks 0..18: image

Tracks 19..79: image

Not sure if that has any relevance?

keirf commented 6 years ago

Light green means the sector is full of zeroes, I believe :) So matches with the format going backwards on side 0.

keirf commented 6 years ago

Here's a debug build of latest master @ 3Mbaud, so you can test if you can get any of your remote hosts to work at that rate. If you can, a serial log of a failed format could be interesting. For example, perhaps the USB stick goes away for a while during writeback and the host times out on verify (does this format test verify as it goes)?

ff_67_1.zip

PeterBBCUser commented 6 years ago

It fails on the test but the disk is in-tacked as Keirf and usable so not sure where it's going wrong, and have been testing it with the Zen game which writes a lot to the disc and up till now have had no problems. will look again this evening.

keirf commented 6 years ago

Thanks! I think @drdpj was the only person to report the most interesting error (bad write went to wrong offset and corrupted track): https://github.com/keirf/FlashFloppy/issues/57#issuecomment-370177730

kgl2001 commented 6 years ago

I'm struggling to get this debug version working with a Linux terminal. I start the terminal first, then power up the gotek, and this is what I get:

pi@raspberrypi ~ $ miniterm.py /dev/ttyUSB0 -b 3000000
--- Miniterm on /dev/ttyUSB0: 3000000,8,N,1 ---
--- Quit: Ctrl+]  |  Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
> USBHU*▒E▒٥▒▒▒͍▒▒▒▒▒ѕ▒5
> USBH_USR_DeInit

This is using a serial / USB interface on a RPi. Am I doing something wrong? Is the -b 3000000 setting correct? I can get it working at correctly @ 115,200 baud using an older debug version from the other ticket. My other Linux box doesn't recognise the USB device at all. Sorry for getting a bit off topic!

Edit: Restarting the terminal session midway through a format, I got the following, before starting to get rubbish again:

pi@raspberrypi ~ $ miniterm.py /dev/ttyUSB0 -b 3000000
--- Miniterm on /dev/ttyUSB0: 3000000,8,N,1 ---
--- Quit: Ctrl+]  |  Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
)... 2405 us
WNѕ 13312-13823 (512
▒▒      ▒)... 2405 us
WNѕ 13312-13823 (512
▒▒      ▒)... 2405 us
WNѕ 13312-13823 (512
▒▒      ▒)... 2405 us
WNѕ 13312-13823 (512
▒▒      ▒)... us
Write 13824-14335 (512)... 2459 us
Write 14336-148473630 us
Write 13824-14335 (512)... 2459 us
Write 14336-148473630 us
Write 13824-14335 (512)... 2459 us
Write 14336-148473630 us
Write 13824-14335 (512)... 2459 us
Write 14336-148473630 us
Write 13824-14335 (512)... 2459 us
Write 14336-148473630 us

<SNIP>

Write 13824-14335 (512)... 2459 us
Write 14336-148473630 us
Write 13824-14335 (512)... 2459 us
Write 14336-14847
▒▒j▒▒▒▒▒B▒▒▒J   ▒2492 us
WrW▒▒▒▒▒j▒▒º▒B)▒Jrrr▒2418 #▒Write 15872-16383 (512)... 3693 us
WriteL▒¢j▒▒▒ʪB▒▒▒J      ▒2667 us
Write 16896-17407 (513693 us
WriteL▒¢j▒▒▒ʪB▒▒▒J      ▒2667 us
Write 16896-17407 (513693 us
WriteL▒¢j▒▒▒ʪB▒▒▒J      ▒2667 us
Write 16896-17407 (513693 us
WriteL▒¢j▒▒▒ʪB▒▒▒J      ▒2667 us
Write 16896-17407 (513693 us

I wonder if the RPi just can't keep up? I might try installing Linux on another desktop machine and see if it runs better on that...

keirf commented 6 years ago

Yes I guess something is not keeping up. I could try giving you a 2Mbaud build. By the way, you could probably build the firmware yourself on RPi: https://github.com/keirf/FlashFloppy/wiki/Building-From-Source

For debug build the last line in the instructions should be "debug=y make dist". The baud rate is set at the top of src/console.c

kgl2001 commented 6 years ago

I will have a look at building from source myself, but can you provide a 2Mbaud build in the interim? I want to check if I can get this interface working reliably at the higher speeds we need.

Thanks.

keirf commented 6 years ago

Here you go, 2Mbaud debug firmware: ff_2mbaud.zip

PeterBBCUser commented 6 years ago

took the basic program (format on Disc Utility disc I put up here the other day) so i could mod it to see what error code I am getting . On format i get Error 10 -(drive not ready). During a transfer the disc stopped rotating. Often caused by a badly inserted disc. But not on all tracks just random tracks.

keirf commented 6 years ago

That is the error when INDEX pulses aren't generated frequently enough. Which is a bit odd when index-suppression = no. I'm not sure I have any great theories on this. It is just about conceivable that an INDEX pulse can be missed when restarting read after seek or write, I think.

I could add some tracing for this, if @kgl2001 gets high-baud rate logging working. Or @drdpj could collect a logic analyser trace, if he can repro this issue.

drdpj commented 6 years ago

I'll see if I can. Not been keeping up with the posts today so if someone can give me exactly how to recreate, I'll give it a shot later if I get a chance.

d.

On 6 Mar 2018 17:06, "Keir Fraser" notifications@github.com wrote:

That is the error when INDEX pulses aren't generated frequently enough. Which is a bit odd when index-suppression = no. I'm not sure I have any great theories on this. It is just about conceivable that an INDEX pulse can be missed when restarting read after seek or write, I think.

I could add some tracing for this, if @kgl2001 https://github.com/kgl2001 gets high-baud rate logging working. Or @drdpj https://github.com/drdpj could collect a logic analyser trace, if he can repro this issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keirf/FlashFloppy/issues/67#issuecomment-370853844, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdNPGYIiGhnstlwgB1J7y5L4M-7ExA2ks5tbsIfgaJpZM4SdsLW .

PeterBBCUser commented 6 years ago

Even formatting a disk with the FORM80 program supplied on the BBC Welcome disc it gives 2 or 3 tracks with a problem, then on verifying it slows on some of those tracks but gets over reading it other times it gives a verify error. looking with the HxCFloppyEmulator software at those tracks I don't see a problem so it's something that is borderline.

keirf commented 6 years ago

@PeterBBCUser Interesting so you would say, even though the format looks superficially okay, there is something baked into the HFE that makes verify slow or fail?

Perhaps you can attach such an HFE and let me know which tracks are affected?

PeterBBCUser commented 6 years ago

WelcomeFormat_test_ssd.zip Okay here is an image formatted with the Welcome format program, it looks fine in the FloppyEmulator The format gave errors on Track's 2C(44), 36(54) and 3F(63) I ran a verify (ADT) twice and got errors on Track's 2C(44) and 36(54) and track 3F(63) did a slow step listening to the speaker on the gotek. Using the Verify from the welcome disc I hear it slow step on tracks 2C and 3F but did not show up any errors.

keirf commented 6 years ago

How weird. I can't see anything odd about those tracks at all, in the HFE.

A guess then, perhaps those tracks sit at a particular FAT cluster boundary and are slower to load in... clutching at straws rather. Need serial logs.

EDIT: That theory would imply it would generally be the same tracks giving problems every time. Is it random though, or the same tracks?

PeterBBCUser commented 6 years ago

Its the same tracks every-time I verify the disc. unless I reformat it.

keirf commented 6 years ago

And when you re-format the very same HFE image again on that same USB-stick filesystem, the tracks change?

PeterBBCUser commented 6 years ago

Yes it does change each time, and though I was using the welcome disc format program but it was the ADF format program as it has the same parameters. With the one from the welcome disc it formats and verify (also from the welcome disc) okay. So it seems to be different formatter's work differently, some better than others.

kgl2001 commented 6 years ago

Got serial interface running at 3MBaud . Had to switch from FTDI interface to PL2303 interface to get it working:

[root@xyz ~]# miniterm.py /dev/ttyUSB0 3000000
--- Miniterm on /dev/ttyUSB0: 3000000,8,N,1 ---
--- Quit: Ctrl+]  |  Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---

** FlashFloppy v0.9.14a for Gotek
** Keir Fraser <keir.xen@gmail.com>
** https://github.com/keirf/FlashFloppy

Board: Standard
Config: Flash Slot 13 (ver 2, size 48)
Interface: Shugart
I2C: OLED found at 0x3c
Display: 1602 LCD

> USBH_USR_Init
> USBH_USR_ResetDevice
> USBH_USR_DeviceAttached
> USBH_USR_ResetDevice
> USBH_USR_DeviceSpeedDetected
> Device speed: Full
> USBH_USR_DeviceDescAvailable
 VID : 0781
 PID : 5571
> USBH_USR_DeviceAddressAssigned
> USBH_USR_ConfigurationDescAvailable
> Class connected: 08 (MSC)
 Manufacturer : SanDisk
 Product : Cruzer Fit
 Serial Number : 4C530001311128122025
> USBH_USR_EnumerationDone
> USBH_USR_UserInput
0:F: 'Test-A2.hfe'
Mode: Native
Current slot: 10/10
Name: 'Test-A2' Type: hfe
Attr: 20 Clus: 00000981 Size: 2008064
New max: read_us=1944
New max: read_us=2277
New max: read_us=2299
New max: read_us=2321
New max: read_us=4473

Now to collect some logs...

keirf commented 6 years ago

Ah yes PL2303 clones are all I use :)

kgl2001 commented 6 years ago

BBC Micro, 8271 controller, DFS1.20, FF 0.9.14a (debug), index-suppression = no

I'm using Watford Utils, and getting a consistent Verify Fail at the first sector. I don't think I was getting this before, so I'm not sure what's changed.

First step in the utility is to format all 80 tracks. It then moves onto the verify phase. This is where it fails. Not sure anything meaningful is getting picked up in the log.

Test-A2.zip

Edit 1: This time verify failed at track 8. Again, I can't actually see anything failed.

Test A3.zip

Note that the Disc Doctor Format & Verify works fine.

keirf commented 6 years ago

The HFE in Test-A2 is a complete mess, right? There are loads of (bogus) 12us bitcells, as though write pulses are getting missed every now and then. But I don't see that is Test.A3 at all, that on first impression is a totally valid HFE. How odd -- the two resulting HFEs don't look similar at all.

keirf commented 6 years ago

Inclined to deprioritise or even close this one. There are reliable methods for format, and this doesn't seem to be a corrupting bug (in fact the failures all seem to be spurious except kgl's weird Test-A2).

I plan to get SSD/DSD done very soon (before v1.0) and that will be a nice alternative to HFE here, too.

kgl2001 commented 6 years ago

Unfortunately I've not had much opportunity to test over these last few days but I've got a bit of free time now to see if I can reproduce the same or similar results.

Direct .ssd/ .dsd support would be awesome.

PeterBBCUser commented 6 years ago

I have to agree with you Keirf one can always format a floppy on the HxCFloppyEmulator and I have not experienced any data corruption writing to disc's, like the game Zen which reads/writes a lot to disc, it has not crashed on me once.

Yes indeed looking forward to .ssd/.dsd support saves having to convert everything to .hfe and don't know of anyway to convert them back if needed. Peter.

keirf commented 6 years ago

If HFE reliability is better on HxC Gotek then that is a bait I can rise to! It is an interesting data point in itself.

EDIT: Also FYI I have a command line tool which would convert HFE back to a sector image like DSD.

kgl2001 commented 6 years ago

Unfortunately I don't have a copy of the HcX firmware with which to bait you :(. I was offered a copy by Jeff (who writes that software - I think), but the Gotek thread on Stardot where this was being discussed was split up, and I think Jeff got a bit upset. I heard no more from him after that!

drdpj commented 6 years ago

It was made very clear that we were more than happy to have hxc firmware discussed there, but what we didn't want to see happening was hxc firmware being pushed hard every time the other got mentioned (eg jumping into threads). And vice versa.

On 9 Mar 2018 16:40, "Ken Lowe" notifications@github.com wrote:

Unfortunately I don't have a copy of the HcX firmware with which to bait you :(. I was offered a copy by Jeff (who writes that software - I think), but the Gotek thread on Stardot where this was being discussed was split up, and I think Jeff got a bit upset. I heard no more from him after that!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keirf/FlashFloppy/issues/67#issuecomment-371866970, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdNPDt8izR7Dk87VRtNxQjnrEGqXbniks5tcrCKgaJpZM4SdsLW .

kgl2001 commented 6 years ago

I've been doing a few tests. Nothing conclusive yet, but when using the Watford Utilities and the debug firmware (running at 3Mbaud) I'm definitely getting more format / verify errors than when I use the pre-release (non debug) version. Firmware seems quite timing sensitive!

PeterBBCUser commented 6 years ago

Indeed as I noticed the difference when we switched from the debug to the non debug version.

kgl2001 commented 6 years ago

With the pre-release (non debug) version I'm getting 100% success on format / verify, but when the Watford Utilities moves onto the write phase it fails more frequently than it passes (1 pass from about 4 attempts). If I then try to capture the write failure with the debug firmware, I can't get past the format / verify phase! A bit of a catch 22.

PeterBBCUser commented 6 years ago

I figured if the read/write works fine for data then why worry about the format as you can always format/create images with the HxCFloppyEmulator software.

kgl2001 commented 6 years ago

Sure. For me, formatting with Disc Doctor & Watford Utilities seems quite reliable with the non debug firmware. It appears to be during the stress test writing stage that Watford Utilities fails, but I can't get the logs (which would be useful), because when I switch to the debug version, I can't even get past the format stage.

keirf commented 6 years ago

Format will be highly timing sensitive as the write is from index to index. And if the index pulses are perturbed then the write will be too long or too short. It would probably be helped by the patch to stop a write from overwriting itself. I could port that onto latest code and you could give that a go with the debug build?

@kgl2001: Write fails are very interesting, can you send an HFE of failed write?

kgl2001 commented 6 years ago

Typical! It takes about 15 mins to run through all the tests. If you're not watching what's happening, then you just get an indication that the Disc test either passed or failed. Sat through the full 15 mins, and it passed this time! I'll try again...

Edit: And that's a second pass!!! Edit2: 3 from 3! Edit3: Finally! Write error track &40 (64)

Test A5.zip

kgl2001 commented 6 years ago

I'm happy to try testing a new debug version, patched to try and stop overwrites, to see if that helps me get past the format issue.

keirf commented 6 years ago

Thanks for these I will take a look over the weekend and get you another debug firmware made. I don't imagine it will be this evening though.

keirf commented 6 years ago

The Test.A5 HFE looks okay, would you agree? So the error is perhaps a timeout...

Anyway, here is the patched version of master with the overwrite avoidance. There are debug (3Mbaud) and normal versions in the zip for you to try.

ff_67_2.zip

kgl2001 commented 6 years ago

Normal (non debug) version - Write error track &2A (42). As before, there doesn't seem to be anything wrong with the image, but as with Test.A5 it looks like I get the fail at the final sector of track.

Test A6.zip

I'll try the debug version now, and see if I can at least get past the format stage.

keirf commented 6 years ago

Looks like this write test perhaps writes each sector in turn, then verifies it. In Test.A5 and Test.A6 you can see the the sector CRC changes on the last sector of the failing track: All prior sectors on all tracks have the same CRC x; All subsequent sectors including that final sector of failing track have the same but different CRC y.

So it is probably a timeout on writing that last sector. Either it has seen not enough index pulses, or has received index pulses with no read data (so unable to find the sector ID).

Yeah serial log would be interesting (and I can add more debug printing if necessary). It could also be worth writing a simple BASIC program to duplicate this test ie. start with a formatted disk and then write/verify same data to every sector in turn, then at least you could see what error is returned?

kgl2001 commented 6 years ago

Ok, seemed to get past the format stage without any issue with this debug version, but then got a drive error fairly late on in the write stage. I think the drive error came at track 20 and looking at the .hfe image something definitely happened at that time - again on the final sector. The final event log entries might also give a clue:

Write 2560-5119 (2560)... 4397 us
Read whole track 40... 39840 us
Write 5120-7679 (2560)... 4487 us
Read whole track 40... 39829 us
Write 7680-10239 (2560)... 5179 us
Read whole track 40... 39381 us
Write 10240-12799 (2560)... 4796 us
Read whole track 40... 39597 us
Write 12288-15359 (3072)... 5698 us
Read whole track 40... 39532 us
Write 14848-17407 (2560)... 4386 us
Read whole track 40... 39648 us
Write 17408-19967 (2560)... 4610 us
Trk 40: late 17us
Read whole track 40... 39111 us
Write 19968-22527 (2560)... 5845 us

Test-A7.zip

Edit1: I presumed the sector CRC differences were down to the different data that was saved in the sector.

keirf commented 6 years ago

Yes CRC differences are due to differing data. It's just an indication to us of progress of the write-sector-at-a-time test :)

Reading whole track for single-sector writes is perhaps a bit unfortunate. Adds a good 35ms latency to each write I'm sure.

I may also add debug printing to indicate time between index pulses.

Watch this space. ;) I will try to find time later...

kgl2001 commented 6 years ago

In response to your earlier comment about the write test being verified, I'm not entirely sure what the Watford Utility is doing. The basic steps that are reported are:

Format tracks 0..80 Verify tracks 0..80 Write tracks 0..80 (fixed value to all sectors) Read tracks 0..80 Write tracks 0..80 (different fixed value to all sectors) Read tracks 0..80 Write tracks 0..80 (bytes 0..255 in each sector) Read tracks 0..80

I could write a small program to get better control of what's actually happening. Would take me a little bit of time to understand how to carry out low level read / writes to individual sectors. It shouldn't be too complicated, though.

Similarly, I've got a few other things I need to do today, so I'll try and squeeze this in along with everything else I need to do.

keirf commented 6 years ago

Yes I'm also afk most of the day so we'll just see how things go.

PeterBBCUser commented 6 years ago

And I am still waiting for my RS232 to USB so cant help with testing any debug version still sigh

kgl2001 commented 6 years ago

Going back to the formatting issue, I've switched back to the Disk Utilities formatter and been able to do some more tests. With the latest debug build, I would get an almost immediate format error at track 79 or slightly below (similar to that in the very first post of this ticket). Had a quick look at the code, and made a change so that the formatting stepped up from 0..79 instead of down from 79..0. What I then noticed was that the formatting would run fine until approx track 60 and it would then fail again. So the issue seems to be at the higher tracks?

The BASIC formatting program is issuing a 10 sector, track format command (OSWORD &7F. command &63 - see page 236 of http://msknight.com/bbc/manuals/advanced-disk-user-guide.pdf) to the DFS and loops round for all 80 tracks. I added a small delay into this loop, so that there was a small pause between issuing each track format command, and all of a sudden the format is working reliably! This works with the formatting stepping up or down. @PeterBBCUser if you want to try this, add the following line into the disk test utility:

915FOR delay%=0TO1000:NEXT

I then started getting write errors, but I've not looked that far yet. I'm guessing that if I add a similar pause in that loop, I will get that working more reliably too.