sinara-hw / Kasli

Kasli is a powerful FPGA carrier, capable of controlling 12 Eurocard extension modules.
Other
16 stars 1 forks source link

Inadequate IC6 decoupling blows up C299 #99

Open TopQuark12 opened 2 years ago

TopQuark12 commented 2 years ago

We recently had the 12v supply input shorted on a Kasli v2.0.1 fresh off the production line, powered on for around 1 hour. C299 (TAJB336K016R) was smoked and shorted. The faulty capacitor was removed and the board functionality was restored.

It was suspected that the capacitor ESR together with high ripple on the P12V0_SMPS rail caused excessive heating leading to the failure. As such, the P12V0_SMPS rail was probed with C299 un populated to see what condition C299 was operating in. To reduce inductive pickup of the scope probe, a probe socket was soldered to C299 footprint and the measurements are done with probe inserted into the socket.

DSC03594

before

As can be seen from the screenshot, there are high frequency noise spikes >1V p-p present on P12V0_SMPS. It was suspected that the multiple 10uF ceramic capacitors (C130-C133) were not adequately decoupling the switching current spikes generated by IC6.

sch

To try and reduce the switching noise of IC6 on P12V0_SMPS, 100nF 0603 ceramic capacitors were soldered on top of the 10uF capacitors decoupling IC6 input pins.

DSC03595

piggyback

As can be seen from the scope screenshot, the 100nF caps soldered on top of the 10uF capacitors made no effect on the switching noise.

Upon inspection of the PCB layout, it was discovered that the vias connecting the 12v rail to IC6 were placed between IC6 input pins and decoupling caps. This arrangement is known to reduce the effectiveness of the decoupling capacitors.

decoupling

To prove this hypothesis, 100nF 0402 ceramic capacitors were soldered directly to the power input pins of IC6, with the ground connections of the caps made to be as short as possible.

WIN_20220627_18_42_57_Pro

proper

This decoupling arrangement had a drastic effect on reducing the switching noise spikes on the P12V0_SMPS, and reduce the ripple current going through C299, possibly extending its lifetime.

To workaround the issue of C299 blowing up, MLCC capacitors can be used to replace C299. If ESR of original C299 is desired to dampen the LC network formed by L38 and C299, C130-C133, series resistance can be added to one of the multiple MLCC caps replacing C299. This arrangement prevents the ugly outcome of smelly blown up tantalum caps.

Placing 100nF capacitors right at IC6 multiple power input pins, increasing the capacitance / number of decoupling capacitors and placing capacitor pads between vias and IC6 input pins should improve the decoupling performance, minimizing the ripple on P12V0_SMPS and stress placed on C299.

gkasprow commented 2 years ago

Tantalium caps are known to spontaneously short circuit and catch fire.

gkasprow commented 2 years ago

The role of C299 is to dump oscillations of LC circuit. The spikes you are measuring are mainly caused by ground loop effect. Try measuring them on the negative terminal of the same cap with GND clip kept in same place. They won't differ much. I usually use special low impedance probes to measure ripples because standard scope probes simply don't work.

TopQuark12 commented 2 years ago

The role of C299 is to dump oscillations of LC circuit.

I have noted the role of C299 in my original comment, and have suggested using a discrete resistor in series with a MLCC to provide the required dampening to improve reliability.

The spikes you are measuring are mainly caused by ground loop effect. Try measuring them on the negative terminal of the same cap with GND clip kept in same place. They won't differ much. I usually use special low impedance probes to measure ripples because standard scope probes simply don't work.

I have verified the integrity of my measurements with your suggestion, and did not observe ground loop noise pickup with my setup. You are welcome to reproduce my test with the equipment you have.

I also measured the 12v rail with a differential probe. While the signal level is close to the noise floor of the probe, a significant difference can be observed when the 100nF caps are soldered right at the power input pins VS 100nF caps disconnected.

diff before

diff after

gkasprow commented 2 years ago

Yep, a sub-Ohm resistor in series with MLCC is also a good solution. Of course, 100nF should be present because it has a much higher resonance frequency. Via placement is a bug. But I'm not sure if mV or ripples could heat the tantalum capacitors and blow them.

jordens commented 2 years ago
gkasprow commented 2 years ago

I will reach out to the production and ask if maybe they had to use some replacement for this tantalum capacitor. In the case of tantalum caps, one must add some derating because they tend to blow when operated close to Vmax.

gkasprow commented 2 years ago

CTI installed TAJB336K016RNJ

sbourdeauducq commented 1 year ago

Yep, a sub-Ohm resistor in series with MLCC is also a good solution.

@gkasprow Should we switch to that in the next Kasli revision?

gkasprow commented 1 year ago

sure, we can which will lower the cost but I don't think this is an issue. Tantalum caps are known to spontaneously ignite.

sbourdeauducq commented 1 year ago

Tantalum caps are known to spontaneously ignite.

I don't see this documented anywhere, in fact https://en.wikipedia.org/wiki/Tantalum_capacitor#Reliability_and_life_time suggests they are reliable when they are used correctly.

@gkasprow Would you or your team be able to do the ripple measurements again to double-check? At 33µF it doesn't take much high-frequency ripple voltage to cause high currents to flow through the capacitor.

gkasprow commented 1 year ago

They do fail. I used to work as a repairman 20 years ago and was replacing them quite often. https://ntrs.nasa.gov/api/citations/20205008339/downloads/Breakdown%20and%20Self-healing%20DEI%20rev%20A1.pdf https://iopscience.iop.org/article/10.1149/2162-8777/abf728

anyway, they are placed in parallel with three ceramic caps with much lower impedance. Such low trace impedance cause peaks with negligible energy due to their length. And, tantalum and generally electrolytic caps have high impedance for such high frequency components. Let's simply replace it with ceramic cap and 0.2R resistor. The function will be the same. I don't have any v2 Kasli to test it.

dhslichter commented 1 year ago

We just had the same failure of C299 on a Kasli yesterday. It does appear to be related to the tantalum cap being operated close to its rating, and if for some reason you happen to get a voltage spike it may be enough to cause it to have a thermal runaway. I replaced the dead cap with two 22 uF 25 V X7R caps in parallel (DC bias derates this to more like 18 uF total). Didn't add series resistance but the board powered up OK. I can have a look with a scope and post a trace to see how large the spikes are currently.

I support @gkasprow's proposal to replace with ceramic plus small damping resistance in future versions (also in Kasli SoC).