ploopyco / headphones

A set of 3D-printed headphones, alongside a DAC/amp/EQ board powered by a Raspberry Pico.
833 stars 43 forks source link

Firmware PM8 Requires reflash with every unplug #15

Closed keeb-time closed 10 months ago

keeb-time commented 1 year ago

Flashed firmware pm8 and followed the readme. Flashing the firmware works and the amp immediately reboots and is usable. The amp contiues to work great until it gets unplugged. Upon plugging the amp back in it isnt recognized by the system until the firmware is flashed again, then it works great until unplugged and replugged again. I'm unsure why this is as pm7 worked fine. Computer is running windows 11.

ploopyco commented 1 year ago

Crap. Yes, this is confirmed. It's also an issue with all builds of the pm9 release.

@george-norton Do you have any ideas? I can debug and fix this issue, but I can only start on it by next Wednesday.

george-norton commented 1 year ago

Does this happen with other operating systems or just Windows 11 (how about windows 10?)? I have been running my own builds which should be equivalent to pm8 and pm9 and I haven't seen any issues, but I don't use Windows all that much.

george-norton commented 1 year ago

I tried the pm9 binary with my Windows 10 laptop and I wasn't able to reproduce the issue. Not sure what the cause may be, nothing jumps out when looking at the code again.

Could it be some effect due to the increased I2C clock speed?

ploopyco commented 1 year ago

Hm. It's possible. The DAC not working after it's been unplugged and plugged back in means that the flash memory isn't being successfully accessed. I know that increasing the SPI bus speed causes this problem, but that's because the MCU and the flash chip communicate via SPI. So, increasing the I2C clock speed shouldn't affect that, but it might be affecting something else.

If you can't reproduce the issue, I'll tackle this.

george-norton commented 1 year ago

It seems more likely that either the ARM core is taking a fault, or the usb stack is stalling the control endpoint due to some error. I don't have a Windows 11 system - are you able to reproduce the issue on a different OS?

keeb-time commented 1 year ago

I can see if it works on my work mac, testing in a few here.

keeb-time commented 1 year ago

Can confirm that the problem is the same on my mac, finder even reports that the disc doesnt eject properly. After a plug and replug, the system doesnt see it. This was attempted with pm9.

ploopyco commented 1 year ago

Okay...strangest thing. I can't reproduce the issue anymore. On two separate computers, across multiple boards.

@keeb-time, can you try deleting the USB descriptors from your Windows machine and trying again? If that doesn't help, I...honestly have no idea what we're going to do, actually.

george-norton commented 1 year ago

Maybe we could prepare a test build with the I2c clock reverted to 50khz to see if that fixes it for @keeb-time ?

ploopyco commented 1 year ago

Got a test build live here. @keeb-time, let me know if it makes a difference.

keeb-time commented 1 year ago

Thanks for the test build, just flashed it from my windows computer and it rebooted straight into being a sound output device. I think this is good :)

I'll test on my work mac tomorrow, but I'm betting it works there too

EDIT: I spoke too soon, unplugging the correct cable helps.... it didn't come back after the replug.

keeb-time commented 1 year ago

is there anything I can do to assist? is there a way to put the amplifier in a debug mode and capture the output?

george-norton commented 1 year ago

Unfortunately the DAC doesn't have an easily accessible serial port or debug interface, they are on test points rather than a through hole debug header so they are awkward to access without additional hardware. I don't know about logs on Windows, but if you connect to your Mac then run 'dmesg' you should see a few log messages from MacOS. But I suspect there won't be anything too helpful.

Other than that, if you can spare some time, perhaps we can start doing a binary chop on the changes to see what works and what doesn't. I.e. I could produce a firmware image that contains around half the changes that went into pm8, and we can narrow down the problem that way..

keeb-time commented 1 year ago

Presently Upon booting with the amp plugged in pre-flash: https://pastebin.com/MafjaPTQ

dmesg after reset and flash: [ 720.640404] usb 3-7: new full-speed USB device number 6 using xhci_hcd [ 720.768229] usb 3-7: New USB device found, idVendor=2e8a, idProduct=0003, bcdDevice= 1.00 [ 720.768243] usb 3-7: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 720.768246] usb 3-7: Product: RP2 Boot [ 720.768249] usb 3-7: Manufacturer: Raspberry Pi [ 720.768252] usb 3-7: SerialNumber: E0C9125B0D9B [ 720.773601] usb-storage 3-7:1.0: USB Mass Storage device detected [ 720.774295] scsi host1: usb-storage 3-7:1.0 [ 721.778230] scsi 1:0:0:0: Direct-Access RPI RP2 3 PQ: 0 ANSI: 2 [ 721.778987] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 721.780196] sd 1:0:0:0: [sdb] 262144 512-byte logical blocks: (134 MB/128 MiB) [ 721.780762] sd 1:0:0:0: [sdb] Write Protect is off [ 721.780772] sd 1:0:0:0: [sdb] Mode Sense: 03 00 00 00 [ 721.781383] sd 1:0:0:0: [sdb] No Caching mode page found [ 721.781391] sd 1:0:0:0: [sdb] Assuming drive cache: write through [ 721.788388] sdb: sdb1 [ 721.788760] sd 1:0:0:0: [sdb] Attached SCSI removable disk [ 731.014677] usb 3-7: USB disconnect, device number 6 [ 731.015609] device offline error, dev sdb, sector 260 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2 [ 731.015627] Buffer I/O error on dev sdb1, logical block 259, lost async page write [ 731.038610] FAT-fs (sdb1): unable to read boot sector to mark fs as dirty [ 732.169028] usb 3-7: new full-speed USB device number 7 using xhci_hcd [ 732.296451] usb 3-7: New USB device found, idVendor=2e8a, idProduct=fedd, bcdDevice= 2.00 [ 732.296463] usb 3-7: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 732.296468] usb 3-7: Product: Ploopy Headphones [ 732.296471] usb 3-7: Manufacturer: Ploopy Corporation [ 732.296473] usb 3-7: SerialNumber: 000000000001 [ 732.351939] usbcore: registered new interface driver snd-usb-audio

Now, I will unplug and replug: [ 833.620774] usb 3-7: USB disconnect, device number 7

that is the only thing that changes, it never re-recognizes the device after the replug. This might be an issue with starting up the usb controller?

I did all this from my windows laptop running a fedora live image, let me know if there's any more information you need. I want this to work right not just for me but for anyone else that runs into this.

ploopyco commented 1 year ago

@keeb-time Out of curiosity, what happens if you try George's headphones toolbox build?

keeb-time commented 1 year ago

Sorry about the delay, work was super busy and I needed a day. I just attempted with that build and can say that it still needs to be reflashed each time I plug it back in in order to stay working. For the record I have an early unit and am flashing it by shorting the two reset pads and then dragging and dropping the file onto it when it shows up as a drive.

keeb-time commented 1 year ago

Just tried playing with the toolbox software and even using it to tell the amp to reboot into bootloader and then load the files it still doesnt persist through a reboot. Perhaps my unit has bad flash?

george-norton commented 1 year ago

@ploopyco I assume all shipped units have identical hardware? You haven't changed flash chip at some point?

@keeb-time I suspect it's not a flash issue, it seems to work from a warm boot, but not a cold boot. The raspberry pi foundation have a tool called picotool (https://github.com/raspberrypi/picotool) which might let us do some interesting experiments. But it is only distributed as source code, so it needs compiling first.. It lets you read back the contents of flash and verify that it matches the expected uf2 image. We could also start up in bootloader mode and the reboot, which might simulate the warm boot case.

keeb-time commented 1 year ago

This looks interesting, I'll set up a VM (my laptop has...issues...with linux lol) and see about getting it compiled

george-norton commented 1 year ago

Not sure how well it runs in a virtual machine, it will need raw access to the USB device so you would probably need to configure it in exclusive mode (or whatever it is called.) It can be built for Windows.

keeb-time commented 1 year ago

looks like hyper-v cant pass usb from the host in a traditionay way, thats unfortunate. not quite sure how to build anything on windows, the github page has a few things but im not very familliar with windows tools for these things. Am willing to run a binary though

keeb-time commented 1 year ago

So I managed to get it to build on linux haha...wasn't too bad. Got rid of the VM and am running it native. What would you like me to look at ?

george-norton commented 1 year ago

You can picotool save headphones.uf2 (or picotool verify) to read the flash and write it to a uf2 file. You can compare the file it reads back and make sure it matches the uf2 file you expect.

If it matches, try picotool reboot to see if it starts from a warm boot.

keeb-time commented 1 year ago

well, look at that...md5 hashing both the flash that picotool saved and the flash that I downloaded from git produces the same hash so its bit perfect. I tried the warm reboot next and I have a working amp! I don't know why that works but it does. I have to manually reset it still to get it go into BOOTSEL mode and then reboot it for it to work but I dont have to reflash it

george-norton commented 1 year ago

It's a bit odd.. During start up there are a bunch of sleep statements which wait for hardware to settle. Hardware can take longer to settle when the chip is cold. I sped up the I2C clock which would act like a reducing the length of some sleep statements since the I2C writes are blocking operations and now they complete faster, but the patched build with the clock speed reduced didn't fix it.. I will take another look at the change set.

keeb-time commented 1 year ago

Thanks George, I'm ready to test any builds you come up with. Also, well done on the toolbox!

george-norton commented 1 year ago

I have spotted something that seems a bit wrong in the PM8 changes. I have changed the order of some I2C commands in my latest Headphones toolbox firmware here: https://github.com/george-norton/headphones/releases/tag/headphones-toolbox-alpha-v3

The code is here: https://github.com/ploopyco/headphones/blame/47b94b8d73fd3a8a2918066c51f36deca82f00f3/firmware/code/run.c#L232

Basically we issue a PCM3060 reset on line 227 (MRST | ADPSV | DAPSV) Then we configure the data mode to 16bit. Then we issue a second reset on line 240 (MRST | SRST | ADPSV | DAPSV)

I don't understand why the 16bit data configuration actually works - the MRST bit in the second reset should reset all the registers to the default values..

ploopyco commented 1 year ago

@ploopyco I assume all shipped units have identical hardware? You haven't changed flash chip at some point?

All units have shipped with identical hardware so far. The flash chip is the same.

I don't understand why the 16bit data configuration actually works - the MRST bit in the second reset should reset all the registers to the default values..

For that particular revision of run.c, the write to Register 64 sets MRST to 1, which doesn't reset the registers to default values. So, it makes sense to me that the 16-bit data configuration persists.

keeb-time commented 1 year ago

I've flashed the new version that you made and I can tentatively say that it works. It's survived three unplugs so far. I'm tempted to put the case back on.

keeb-time commented 1 year ago

And now its back to not starting up properly...not sure why it seems intermittent

keeb-time commented 1 year ago

Sorry for the multiple comments here, I'd say that whatever you did is an improvement enough that the amp is usable so long as I plug and unplug it enough times. Something is still off but it's an improvement over how it was. I'm curious as to why the firmware's prior to this one work every time. I'm new to the dev world and this has been a good learning experience so far

Roosta078 commented 1 year ago

This seems like the same bug I have experienced and fixed in my PR from yesterday. This change completely fixed the issue for me. To summarize, the SPI flash speed was getting set out of spec for the flash chip. Perhaps the higher clock speed was fine on some silicon, but not on others. @keeb-time could you try out that version?

keeb-time commented 1 year ago

I should be able to flash that later tonight, looking forward to seeing how it goes!

george-norton commented 1 year ago

Thanks @Roosta078, I am on vacation this week, I will take a look next week. I noticed the Pico SDK introduced some new functionality for adjusting the peripheral clocks when the system clock changes. See 'PLL and Clocks' in the release notes here https://github.com/raspberrypi/pico-sdk/releases. We just take the current master of the Pico SDK repo, so it is possible something changed in the SDK between the pm7 and pm8 builds, not sure. Also it is possible enabling this new feature might help. I don't quite understand why this stuff might have changed between pm7 and pm8 as I didn't mess with the system clock of QSPI clock.

Roosta078 commented 1 year ago

I worry that PM7 and earlier were not built with a clean pico-sdk. I checked on the old images, and as described above pm7 worked with unplug/replug. pm8 did not. I then built from pm7 commit and replugging no longer worked. I then went and reverted my pico-sdk and pico-extras to the lastest commits as of pm7 and replugging still did not work. My theory is that there may have been local modifications to ploopyco's sdk and that at some point between pm7 and pm8 they pulled the latest version.

george-norton commented 1 year ago

If you use the download SDK environment variable, the SDK is downloaded once, but never updated when you rebuild. So potentially the pm7 image is built against a fairly old SDK checkout.

Probably we should build against officially tagged SDK releases rather than just using the master branch at the time you happened to do your first build, that would at least give us reproducible builds.

keeb-time commented 1 year ago

@Roosta078 I didn't see a build to download and try when I looked the other night, do you have one available?

Roosta078 commented 1 year ago

@Roosta078 I didn't see a build to download and try when I looked the other night, do you have one available?

Yes, and sorry for the delay, Github did not send a notification. I uploaded a build to the PR. Let me know how it goes.

keeb-time commented 1 year ago

Your build is a winner! I have not had to reflash since and the amp comes up everytime I plug it in! Thank you :)

george-norton commented 1 year ago

Sounds promising, we overclock the RP2040 so it makes sense that we might need to increase the clock divider on the QSPI interface to stop it from running out of spec. The SPI flash is eXecute In Place (XIP) so code runs directly from it without first being copied into RAM. So this change may have a performance impact. It might be an idea to run a few experiments to see if it is noticeable, and then maybe shift the main sample processing loops into RAM if required.

Roosta078 commented 1 year ago

I later noted in the PR that another change that resolved the issue was increasing the PICO_XOSC_STARTUP_DELAY_MULTIPLIER and leaving the SPI clock as is. This may be the preferred solution, but I figured I would open it up to discussion.

george-norton commented 1 year ago

It sounds like that would be preferable, but I would be interested to know what @ploopyco think.

ploopyco commented 12 months ago

Changing PICO_FLASH_SPI_CLKDIV is not preferable. While doing the original dev work, I'd found that reading from flash was unreliable at PICO_FLASH_SPI_CLKDIV values other than 2. I suspect the reason why is that the RP2040 is significantly overclocked, resulting in SPI speeds that are slightly different between actual chips.

I hadn't experimented with PICO_XOSC_STARTUP_DELAY_MULTIPLIER during dev. If it works, I'd go with that.

I'll also paste this comment into the PR.