Closed sbourdeauducq closed 6 years ago
This bug is present on at least 2 channels. I don't have another board to test.
Before I manage to recreate the issue, could you please have a look at the supply rails? +/-12V which supplies DAC and buffers.
Where do I measure those? The PDF schematics aren't in https://github.com/sinara-hw/Zotino
Yep, the power supply is crapping out, here is the -12V:
schematics are in releases OK, so it seems to be the power supply issue. Please check if the power is problematic before the regulators. What is the output load ?
What is the output load ?
Open, I only have the scope (1M) connected on one channel.
please measure V+ and V- with the scope
please also check if the DAC or the LDO above N12V0A is getting hot.
The +12V is working correctly. The -12V at the output of the LDO has the problem shown above, but its input is stable. It does get very hot. So I guess we are seeing the LDO thermal shutdown.
There is +/-15V at the inputs of the LDOs. Isn't that a bit high? Also, the -12V glitches only begin to occur after the board has been running for a few minutes, which is consistent with thermal shutdown.
From a cursory inspection (no thermal camera here yet), nothing else on the board other than the DAC chip gets hot. I wonder if it may be drawing more than its specified current.
I measured R4 and it is 36 ohm, so the TDK-Lambda module should produce +/-13V, not the +/-15V I have...
Contrary to what I thought, the DAC chip on my Zotino 1.1 is not burned, so I have added a heatsink to it and I am testing that board now. The switching regulator output is still +/-15V.
That 1.1 board doesn't have the problem. The -12V LDO is colder, too.
The TDK module is not very well stabilized - the output depends on load. The problem with thermal runaway observed on rev 1.0 was similar - the current consumption from negative rail got very high. It happened once I exceeded temperature on the chip above the limit (ARAIR it was 85 degrees). So it's quite possible that you exceeded for a moment the temperature.
we recreated same problem in the lab. the Zotino board was placed on the most right side of the crate where cooling is not sufficient.
The problem is systematically happening on the v1.2 board, when it is is exposed to air at ambient temperature without hot objects nearby. What should I do, turn on the TEC cooler?
Even with the board outside the crate, the bug is present.
you can try to remove the Peltier module and attach the heatsink directly.
@gkasprow you think this is thermal runaway in the dac? That doesn't seem likely to me. On our boards, with no forced air, the dac is barely warm to touch. How hot is it on your board before this issue occurs?
This happens when the DAC even for short moment gets its temperature exceeded. It gets damaged, it still works but consumes much more current so it enters this thermal trip quickly. Let's add thermal protection in next revision of Zotino which will switch off the power when temperature is i.e. 55 degrees and indicate it with LED.
@gkasprow did you test that by heating the dac with a hot air gun?
I just don't get this:
So I think there must be some other cause for all of this. What do you think?
Let's add thermal protection in next revision of Zotino which will switch off the power when temperature is i.e. 55 degrees and indicate it with LED.
If that's really the problem, this won't be a solution: the thermal protection will just be activated all of the time in the field. Again, the bug is present on my board outside the chassis, and it's not the hot HK weather, I had AC in the lab.
If that's really the problem
As I said, I'm pretty sure it's not the problem, or the eval boards wouldn't work. With the thermal island, I could just about believe it, but without the island this doesn't seem plausible IMHO.
@gkasprow how about we settle it like this:
How does that sound?
Note to self, let's also check that this isn't something dumb like we've used some DAC variant that's only specified over a very narrow temperature range.
I also have AC in my lab and the failure happened today. The SMPS delivers max +/- 15V so even if the LDO gets unstable, it is still within safe operation area. I have the DAC devkit so will give it a try.
I attached the heat sink directly to the DAC after removing the Peltier, the current consumption is stable and the DAC is cold. The board is on the bench and I have the air flow generated by 12V fan supplied from 10V.
I removed the fan and the current consumption started growing. When it had grown by 20mA, I turned the fans on and it felt back to the original value.
It might generally be wise to enable thermal shutdown in the AD5372 by default in ARTIQ.
I removed the fan and the current consumption started growing. When it had grown by 20mA, I turned the fans on and it felt back to the original value.
Do you know from which rail the current was drawn?
Were your observations consistent with the data sheet?
It might generally be wise to enable thermal shutdown in the AD5372 by default in ARTIQ.
We did consider that in the past (and it might not be a bad idea to do implement it), but the thermal shut-down only kicks in at 130C die temperature. IIRC, @gkasprow felt that by that point the DAC had been destroyed. Also, it's not great protection, as it doesn't help if the user powers the board on for a while before programming. Better to ensure that Zotino doesn't spontaneously die under normal use.
I took the ADI devkit, powered it on and heated to 97 degrees. Did not observe any change in current consumption. -12V and +12V was below 10mA (bench PSU current display resolution)
Good!
So, either the DAC on Zotino has been damaged by something, which is leading to this behaviour. Or, we have some connection on Zotino that's different to the eval board. The question is which one? Maybe try disconnecting all the OpAmps from the DAC outputs?
My DAC on Zotino is already broken. What I can do is to measure the output current on the RC filter resistors. I will also check how the +/- 12V rails wake up.
I am pretty sure that the thermal shutdown is designed to prevent and identify (since it doesn't clear itself) destructive temperature excursions. And I'd bet that if the damage is thermal, the shutdown would trip if it were enabled. But yes, I'd also bet that the problem is not bad cooling.
the +/- 12V rails wake up in monotonic way. +12 wakes up faster than -12V.
I am pretty sure that the thermal shutdown is designed to prevent and identify (since it doesn't clear itself) destructive temperature excursions. And I'd bet that if the damage is thermal, the shutdown would trip. But yes, I'd also bet that the problem is not bad cooling.
Exactly.
I let the DAC to trip. Then I switched on the fan and the current consumption got lowered to initial value... So it does not seem to be any latchup effect.
There is one difference between eval kit and Zotino. On Eval kit the pin 13 is grounded. In datasheet is is marked as NC
I connected it to GND but it did not help
@gkasprow my guess would be that if you replaced the DAC with a fresh one this behaviour would disappear for a while until that DAC died too. If that's the case, the real question is "what's killing the DACs"?
The SMPS delivers max +/- 15V so even if the LDO gets unstable, it is still within safe operation area.
Well, we could still be supplying too much voltage to the reference or digital supplies.
the DAC accepts up to 5V on digital supply. We have 3.3V supply which also supplies the LVDS drivers. And these drivers would be killed first.
I'm just curious if this un-connected NC pin could kill the DAC. The reference can survive up to 40V on its input. The DAC draws excessive current from its negative rail.
roughly 60mA from -12V and below 10mA from +12V
Sure, my point is just that characterizing the dead DAC carefully may not tell you anything about what killed it.
I have one more Zotino that works correctly. I shorted pin 13 to GND and that's the only think I can do. I can heat it to 90 degrees and see how it behaves.
I can think of a few things:
I posted it here https://ez.analog.com/message/350275-failing-ad5372-dacs
Every ~15s my Zotino stops producing an output signal for a few seconds, and the blank period is followed by a burst of noise: This is with a kernel that produces a 1kHz tone:
Even with nothing running on the core device, the bursts of noise are still observed, which points to a hardware problem: