Open tfabris opened 6 years ago
Got another repro. Conditions of the repro were:
The BGE was originally paired with my etekcity standalone unit in the house. Moved the BGE to the car.
After starting car, the car paired with my iPhone first.
Used the car touch screen and paired the BGE as a music device, and also re-paired the phone as a phone device.
After the pairing was done, the BGE output was crackling and chipmunk.
AVRCP commands such as pause play next etc worked.
AVRCP track titles worked.
Switching to a different input pauses the player correctl,y but does not solve problem.
Performing a "power off" of the car stereo headunit and back on, also pauses player correctly, but does not fix issue. Note that this doesn’t really power off the head unit; it just sleeps it, so this isn't really telling us anything.
As before, the problem is solved by any actions which reboot the BGE, including rebooting the empeg itself.
Suspicion: this is an I2S slave/master sync timing problem. Hm. Wonder if it would work to boot the BGE in analog mode and then switch it to I2S after boot. Give the sync a chance to get settled on the empeg DSP before kicking it into gear.
More thoughts:
Sometimes the AVRCP commands are fine and sometimes they’re messed up when this happens.
Unlikely to be an I2S sync issue then.
I have added an additional place where the code issues the command "A2DP STREAMING START", and there is a slim chance that it might fix this issue. Not sure. The version where I introduced this change was 1.0.96 with the commit number of 95bac912b5e978ff5a6ab6ba7a8c8fdee9439bb9 .
Wait to see if the additional "streaming start" commands have prevented this from recurring.
If this issue recurs, then consider trying your earlier idea of booting the BGE in analog audio mode and then switching it to digital mode at some point after bootup.
This issue reproduced just now. So the additional streaming start commands didn’t help.
Avrcp commands and track titles worked just like before.
I did not catch the issue in the debugger so I don’t have any more details about it.
To do: try starting in analog mode then switching to digital.
Version 1.0.99, commit 191aa5ed48fa99942c4f0290dd021c83fb46440b contains the following change:
Attempt to fix issue #70 (Broken up high pitched playback on cold boot) by booting the module into analog mode, then changing it to digital and doing another reset. This happens in two places: At boot, and at pairtime.
The issue is so rare that I need to just check this in and live with it a while to see if it fixes it.
Nope, version 1.0.99 induces a new bug. The command that switches the unit back to digital mode is not always interpreted and sometimes the player simply never switches back to digital. We get silence.
I implemented a workaround for the issue induced in 1.0.99; the digital command not being interpreted all the time was fixed by just adding a delay before every chip reset attempt. After that, I had the "boot into analog and then switch to digital" feature working, and things seemed to be working pretty well...
... then I got another repro of issue #70. :( :( :(
On the one hand, darn, it didn't fix it.
On the other hand, I lucked into a repro pretty quick, so I got my answer about whether or not my analog-then-digital trick had any effect on the problem.
I have reverted the changes from 1.0.99, but kept the "delay before every chip reset" thing, because that's super useful to have and I think it should probably have been there all along.
New version is 1.1.2 with commit 3c77c10f366e9cbec090a59cc23ac993992accd2.
TO DO: Next time you get a repro, try the following:
unpair the empeg from the stereo main screen and re pair. If that fixes it then the problem is upstream of the i2s connection.
press the RESET/PAIR BUTTON. This will reset the Bluetooth chip without dropping power. Does that fix it? If not it’s definitely an i2s sync issue perhaps at the hardware level.
I got another repro.
Deleting the empeg car's pairing from the car stereo touch screen (as suggested in the prior comment) made it forget its security key, and then I was unable to re-pair solely from the touch screen. If I delete the security key from one device, then I have to delete it from the other device, or they won't re-pair. So I couldn't do that first diagnostic test.
I was forced to move on to the second diagnostic test: pressing the reset/pair button. This allowed me to re-pair without a power cycle. The re-pair cured the crackling problem, so whatever the problem is, it can be solved in software without a hardware power cycle.
This indicates that the problem is less likely to be a hardware issue. So I don’t think it’s I2S Synch problems, unless the "reset" command on the BlueGigaEmpeg chip also resets the I2S input. My experimental code of "boot to analog, reset, switch to digital, reset" did not solve the problem, I still got a repro with that, so either the resets are unrelated, or, perhaps, I2S hardware synch might still have a chance to get messed up at any startup of the software (including resets).
I think it's more likely, though, that it's Bluetooth related. Some issue in the software somewhere.
Looking back over prior notes, I see that sometimes when this problem occurs, we get issues with the AVRCP connection, i.e., track controls are limited such as Prev/Next not being there on the car touch screen. I think this also points the problem even more towards a Bluetooth issue of some kind.
To do:
Unable to repro on the Etekcity receiver in the house. The Etekcity receiver handles multiple devices better than the Honda stereo. Etekcity receiver allows both the iPhone and the Empeg to be paired with it at the same time and both can stream audio to it at the same time. It plays whichever one was playing the most recently. Pause one and the other one picks up the audio stream. So the two devices don't fight over the connection. I haven't had a repro of the crackling issue yet in this configuration.
Got another repro. No debugger, but more info on conditions:
Physical coldness wasn’t a part of it. It was a fairly nice day and i had only shut off my car for an hour or two. I didn’t expect physical coldness was a part of it but this just confirms that.
Repro was after starting the car and heading out of the driveway.
I had actually reshuffled the player but I don’t think that’s related. I had the volume fairly well down until after the reshuffle so didn’t notice for sure if the repro existed before the reshuffle. I expect it was already reproing when I reshuffled.
AVRCP commands and track titles worked perfectly.
From the Honda stereo front panel, I disconnected and reconnected the Bluetooth device. This fixed the problem. This means that it does not require a reset of power, nor a reset of the Bluetooth module software to fix. Just a Bluetooth reconnect.
Beginning to suspect this is a Honda bug, but I need to catch this in the debugger to be sure.
I was attempting to get APT-X-LL codec working on my Honda (issue #30), and the Honda did not work with APTXLL, it fell back to SBC codec.
However, when I did this... I got two repros of crackling chipmunk playback in a very short time, almost back-to-back but not quite, on the Honda stereo, with the APT-X-LL turned on. I don't know if it's related, but I could imagine that the process of trying the unsupported codec and then dropping back to SBC might cause the handshaking process of the initial connection steps to be flakier and might partially "let the iPhone in the door", causing the problem.
Did my APTXLL attempt truly increase the repro frequency of this problem?
If so:
Still not certain if the APTXLL configuration increases the frequency, but maybe possibly it might?
Got two repros in a row with the following steps:
I got two repros in a row that way and they are attached to this comment, along with a baseline run.
Biggest difference I see between the baseline and the repro is that, with the repro, it's the car stereo ringing the empeg, not the other way round. Both are supposed to work fine though.
To do:
Issue 70 Chipmunk second in a row repro back to back.txt Issue 70 Crackling Chipmunk Repro Debug Output.txt Baseline Good Playback Debug Output.txt
Oh my God, that's it.
A2DP CODEC SBC JOINT_STEREO 44100 BITPOOL 2-53
vs,
A2DP CODEC SBC JOINT_STEREO 48000 BITPOOL 2-53
Somehow when the car rings up the empeg, it tries to do it at 48khz instead of 44.1 khz, and the WT32i lets it, even though its music data is at the wrong frequency.
To do:
I can't find any settings on the WT32i that control the sampling rate between 44.1khz and 48khz, that I'm not already using and setting correctly to 44.1khz.
My only guess is that there is something going on where the Honda stereo is conflating different memory locations and combining the iPhone connection (which might be supporting 48khz) with a ring to the incorrect address (the BGE) with the wrong audio sampling frequency set.
My only solution I think is to detect when this bug occurs (the 48000 message) and do a RESET.
I experimentally implemented some code (not checked in yet) which did the thing: Detected the 48000 message and did a RESET.
It took a bunch of retries to get a repro. When the repro finally occurred, the code worked as designed: It prevented the crackling playback. The Bluetooth module reconnected to the stereo after the reboot, and the playback was correct-sounding.
But it wasn't perfect. There was a side effect: When the Bluetooth module reconnected to the headunit that second time, there was an interesting problem with AVRCP. Play/pause/next/prev still worked and were indicated on the screen, but track titles were not functioning. The touch screen was a simplified version with fewer control buttons and there was a single "unknown track" message on the screen, indicating that the car stereo also knew that something was weird about the AVRCP handshake.
Unfortunately, I didn't save a debug log of the side effect, I didn't notice it until after shutting down the Serial Monitor software.
To do:
Got another repro (off the debugger). My experimental detection code correctly rebooted the unit, and then I also got the AVRCP problem side-effect again.
Attempted experimental fix for issue #70 is checked in, in version 1.1.4 with commit number 1b2843a409c439d7285559158d5e64ee645d7622.
This reboots the module if it detects a 48khz connection, and then reboots a second time to see if that can fix the AVRCP issues that happen immediately after that.
To do:
The "double reboot" fix in commit 1b2843a409c439d7285559158d5e64ee645d7622 indeed seems to have fixed the problem, at least it fixed it during the one repro I had after implementing it. The repro occurred during the pairing process, when I was freshly re-pairing the updated module to my car stereo. After the module rebooted itself twice, it reconnected and all the AVRCP stuff was working correctly as intended.
However it has an unpleasant side effect:
This is a very bad user experience. However if the original problem (the 48000 problem) is caused by a bug in the stereo head unit, then maybe hardly any users will see this. Since this product has only a small number of users overall, there's a good chance that I'm the only one who will ever experience the issue. So maybe that's not too bad, since I know what's going on in these cases.
To do:
Double check that the "Double Reboot" trick works to fix the AVRCP problems on a "regular" repro, i.e., a normal reconnect that is not in the middle of the pairing process.
Obtain a debug log of the root cause of the AVRCP problems with only a "Single Reboot" implemented in the work around. You still want to know what causes the AVRCP problems in this case.
Got more debug logs of the AVRCP problem side effect. I did this by using custom code that performs a single reboot when the 48k problem occurs. States are interesting and more research is needed.
By the way, even with only a single reboot in the code, the "bad AVRCP screen" problem side effect is only intermittent. So it's an intermittent state resulting from a bug that is intermittent to begin with. Sigh.
Here are all the possible sub-states. We start with the intermittent bug where the codec is set at 48k instead of at 44.1k. When that intermittent bug hits, then we have three possible states after that:
Good state: The 48k problem occurs and the code reboots the module, and after the reboot everything is fine.
Honda bug bad screen state: The 48k problem occurs and the code reboots the module, and after the reboot everything is fine except that the Honda screen doesn't show the track data, it shows a simplified control screen, no track titles, and just "empeg Car" as the name of the device. This is fixed simply by refreshing the Honda screen (pressing HOME followed by AUDIO again). This is definitely a Honda bug and I don't need to worry about this state.
Bad AVRCP data: The 48k problem occurs and the code reboots the module. After the reboot, the Honda screen shows a simplified set of controls, doesn't show track data, and this is not fixed by refreshing the Honda screen. This is the state I need to fix.
Attaching debug logs of each of these states.
Issue 70 with single 48k reboot and good correct avrcp functionality after the reboot.txt Issue 70 with single 48k Reboot with messed up AVRCP screen on Honda which was solved by refreshing honda screen.txt Issue 70 with single 48k reboot and bad AVRCP screen on honda which was NOT fixed with a refresh of the screen.txt
The logs show that there is no difference in the messages coming from the WT32i chip between the "good" and "bad" AVRCP screen states on the Honda.
To me, this says that I can't finesse this by detecting some specific state on the messages coming from the serial port. I can possibly try to live with the double reboot and see if that indeed fixes the issue, but that's about it.
To Do:
Reminder: Your current best repro steps (doesn't repro it every time) are:
I think the ultimate solution would be to find out how to prevent other folks from ringing us up (a previously paired host stereo doing a successful RING command against us). Then we would depend entirely on our autoReconnect to initiate the reconnect. However I can't figure out how to do this. The SET BT LAP command does not work for this. So continue to test workarounds for this.
To do:
Try to test your theory that this is caused by a them-rings-you-up problem by turning off auto reconnect temporarily. Force the other guy to ring you up and see if the repro frequency goes up.
With the Double Reboot code running into your testbed BGE unit, then try to get a large number of the 48k-problem repros. Check for the Honda AVRCP screen state problem after each repro.
If you don't see the bad AVRCP screen state problem on the Honda after a large number of repros, then consider it fixed to the best of your ability and close the bug.
If you do still see the bad AVRCP screen state problem on any of the repros, then see if switching inputs off of the bluetooth and back again fixes it.
Indeed changing AutoReconnectMode to 0 and turning off the iPhone's bluetooth connection makes the repro happen. It is not a frequent repro that way, but it tells us the following:
It's definitely due to the Honda ringing me up instead of the other way around.
The iPhone fighting for control of the Honda stereo is definitely not the cause. This is a big deal: You were highly suspicious that the Honda was conflating the iPhone and the Empeg, and it's not, at least, not in bluetooth.
Though I wonder if this was a "the iPhone was the last thing paired and so the stereo tries to go at 48k on the new thing paired". I only got the repro once with the iPhone off and it might have been after the iPhone had been previously the music player.
Though I did have the iphone plugged into the Honda USB (ie ipod mode) but that should't affect this. Should it? Not sure, anything's possible.
Easier repro, seems to be timing dependent.
I got several repros using the latest steps, the double reboot, and the AVRCP screen on the honda was correct every time.
I'm going to call this one Fixed (or at least worked around) for now, and move on to issue #67.
Turns out I don't need A2DP for the repro. I can get the repro very frequently with the following:
With that I got a repro almost every time.
Important:
This morning when freshly re-pairing I had a fairly bad UX where the 48k reboot happened several times in a row after pairing. This was with the unit configured for the SBC codec and a single reboot.
To do:
To do:
To do:
None of the set control audio or loopback commands had any effect. I also tried a whole bunch of different other things grasping at straws until I stumbled on this:
To do:
The intelligent code did not fix the infinite reboot loop.
The reboot loop now occurs any time that the stereo rings us as opposed to us ringing the stereo.
It's possible that the only reason I ever avoid the ring-to-me is because in all my prior testing I always had the iPhone connecting to the Honda and so the Honda didn't try to ring me right away every time.
Once I take the iPhone out of the picture, or the Honda gets into a mode where it thinks it doesn't need to ring the iPhone, then I get the 48k problem and it triggers infinite reboots.
TO DO:
Tested my recent theories. Nope, they were wrong.
Turns out that it's merely because the HONDA STEREO NEEDED A REBOOT.
The honda stereo sometimes gets in a state where it only ever wants to do 48k and no amount of reconnecting seems to help in that case. Power cycle the Honda and everything is fine after that. I have taken several debug logs showing the 48k workaround functioning fine in both 1.1.7 and 1.1.8, with and without the new intelligent streaming start behavior. The ones which functioned fine are always after a fresh reboot of the Honda stereo. This is fine, I don't even need to update the dox because they already say to reboot your stereo.
I have a tech support request into Silicon Labs but I find it likely that my current workaround will have to stand as-is. This is a Honda bug of some stripe.
To do:
I believe that the final state of the workaround for this issue is now checked in with version 1.1.8, commit aa4714e16631d2e00532451237b136d01a93c956.
To do:
(Note: the bug below has been converted into separate issue #78).
With one of the in-process fix builds for this issue being experimented with on the debugger, I got a case of where the car touch screen has NO NAME on the screen and simplified control set, i.e., bad AVRCP somehow. Refreshing the screen and even going from BT to radio and back to BT again didn't fix it. Pause and play and next track still work though. In any case I need to analyze this at some point in the near future. Attached is that log output. The repro for this seems to be:
Issue 70 in 1.1.9 - Got a state of bad AVRCP and simplified controls on the car touchscreen.txt
Version 1.1.9, commit 719835123e95d44247dc48d84e7d0a2d896f3862, contains a few minor tweaks to the behavior of the code which issues the Streaming Start commands. It's mostly the same behavior but there are a few additional bits of bullet proofing. This needs to be retested on all devices you have access to.
To Do:
After more testing I am now convinced that my new way of doing STREAMING START commands is inducing issue #70 as opposed to helping it. The Honda needs the reboot fix with the new method almost every time whereas the old method only repropduces issue #70 once in a blue moon.
Example of a perfectly good session that would have repro'd issue 70 with the new method.
0009786 RING 0 48:f0:7b:57:15:20 19 A2DP
0000002 Detected first A2DP channel connection. Waiting for second one.
0000101 RING 1 48:f0:7b:57:15:20 17 AVRCP
0000003 --^ A2DP STREAMING START THIS IS THE OLD METHOD HERE
0000176 RING 2 48:f0:7b:57:15:20 19 A2DP
0000002 Detected second A2DP channel connection. Starting audio stream now. THIS MESSAGE ONLY INFORMATION, NO ACTION TAKEN
0000002 --^ A2DP STREAMING START THIS IS THE OLD METHOD HERE
0000021 AVRCP 1 GET_CAPABILITIES 2
0000003 --^ AVRCP RSP 02FF
Version 1.1.10, commit number f2c327aff62d75fea7e1c0d55417f86586be0323, returns the "streaming start" behavior back to pre-1.1.8 behavior, which reduces the repro frequency significantly.
I still need to understand when the correct time to issue the "streaming start" is on all devices. There are catch-22s where I can't issue it at certain times with certain device or else I induce bugs on other devices.
Already tested on:
To do:
Retested 1.1.10 on everything but Onkyo so far and it's working fine.
Got response from tech support. I was missing the command "SET BT PAGEMODE" which controls whether or not iWrap answers RING commands and whether or not it is visible in the inquiry.
To do:
If that works, test on all devices and see if it fixes issues 67 and 70 by refusing all RINGs.
SET BT PAGEMODE turns out to be a non-starter. It interferes with the ability to reconnect in the Honda, it lets the iPhone in the door before the empeg has a chance to connect and then I don't get the iPhone+Empeg feature on my Honda that I need. This is even when I set it to mode "1". Only modes 3 and 4 work for this.
I have sent another email to tech support asking about the A2DP STREAMING START. Waiting for result.
The tech support conversation with Silicon Labs is not going well. I'm basically asking "When" and they're answering "yes", meaning they're not understanding my question.
To Do:
Compare the two. Was there a difference in the connection channel order or the timing of the Streaming Start command in those cases? If not, then you're stuck and this is just a Honda bug that you can't control. Close it.
Got a pretty good answer from tech support. Their answer was, essentially:
Though this is a good official answer, I am skeptical because I think this is what induced the 48k problem the most frequently. This is the behavior of the "intelligent" streaming start that I implemented earlier and my memory is telling me that this was the inducer of the 48k problem.
Need to do an experiment to be certain. But I think it's really like this in practice:
Issue streaming start after second and third RING or CONNECT (regardless of a2dp or avrcp):
RING 0 48:f0:7b:57:15:20 19 A2DP
RING 1 48:f0:7b:57:15:20 17 AVRCP
--^ A2DP STREAMING START
RING 2 48:f0:7b:57:15:20 19 A2DP
--^ A2DP STREAMING START
... This seems to be good and rarely if ever induces either issue #67 or issue #70. I think the only reason I ever got issue #67 is becase I was triggering off of "RING 1" or "RING 2" or somesuch, and if the messages came out of order with RING 1 coming first, then I wasn't issuing the command on the second ring receipt and it induced #67. Not sure.
This works well on some devices but I think that this might intermittently induce the 48k problem on the Honda stereo:
RING 0 48:f0:7b:57:15:20 19 A2DP
RING 1 48:f0:7b:57:15:20 17 AVRCP
RING 2 48:f0:7b:57:15:20 19 A2DP
--^ A2DP STREAMING START
To do:
I have 48k repros both with the SILabs method and also with my new 1.1.13 method.
Need to compare these in detail and see if there's a difference.
Issue 70 repro with 1.1.13 new behavior - proof that it's not always the SIlabs method.txt
Baseline good reconnect behavior with 1.1.13 and Honda.txt
Issue 70 repro with Silicon Labs suggested method - proof it is bad.txt
Baseline good reconnect behavior with Honda and 1.1.13.txt
Baseline good pairing behavior with Honda and 1.1.13 - both kinds of pairs.txt
This also might be micro timing, as I'm getting a lot of inducements of both this and issue #79 with the debug cable hooked up. Don't seem to hit either of them without the cable.
I'm increasingly convinced that this is due to bad SiLabs method (only send streaming on receipt #2) and that 1.1.13 new behavior fixes it.
I'm able to get it to happen with the serial cable disconnected on SiLabs method, but not very often if at all with the serial cable disconnected and using 1.1.13 new behavior.
Any cases where I see it happen in 1.1.13 is usually when I'm heisenberging the timing with the serial cable connected.
I think issue #70 occurs when you issue the streaming start simply TOO LATE or NOT ENOUGH.
To do:
I am hoping that version 1.1.13 (currently on a side branch at the moment, not yet checked into master) will fix this.
Keep in mind that micro timing of the "streaming start" command might be a factor on the Honda.
To do:
I have gotten an occasional repro on 1.1.13 with no debug cable, and the streaming start happening as fast as I can get it. I don't know if I can get the microtiming faster or not.
This occasionally repros on 1.1.13 but no more frequently than before.
The workaround successfully takes care of it, so close this issue after you merge 1.1.13 to Master.
On rare occasions, after getting in the car and starting the car, the stereo connects to the empeg and plays music, but the music is "broken up", in other words it alternates quickly between music playing and tiny breaks of silence. It sounds like "crackling".
In addition, the music that is playing is at a higher pitch than it should be (Alvin and the Chipmunks sound).
In addition, the available controls on the car stereo touch screen are (intermittently only) not the full set of controls. There are fewer controls on the screen, for example there is no "next track" or "previous track" control available. This indicates that the initial reconnection handshake process did not complete correctly and not all features are available to the head unit.
The problem is resolved by rebooting the Bluetooth module by sleeping and waking the empeg (long press on empeg top button).
My initial guess it that this seems to be a problem that occurs during the auto-reconnection process, where my iPhone and the empeg are fighting over control of the A2DP channel. However this is hard to prove because the repro condition is very rare and it is hard to catch in the debugger.
To Do:
Note: This might be related to issue #67 - Perhaps it is the same root cause with a slightly different manifestation.