sinara-hw / Zotino

ARTIQ - compatible 32 channel DAC card in EEM standard
10 stars 1 forks source link

DAC death #2

Closed sbourdeauducq closed 6 years ago

sbourdeauducq commented 6 years ago

Every ~15s my Zotino stops producing an output signal for a few seconds, and the blank period is followed by a burst of noise: zotino_fail2 This is with a kernel that produces a 1kHz tone:

import struct

from artiq.experiment import *
from artiq.coredevice.ad53xx import *

class Tone(EnvExperiment):
    def build(self):
        self.setattr_device("core")
        self.setattr_device("zotino0")

    @kernel
    def run(self):
        self.core.reset()
        self.zotino0.init()
        delay(1*ms)
        while True:
            self.zotino0.write_dac(1, 0.1)
            self.zotino0.load()
            delay(0.5*ms)
            self.zotino0.write_dac(1, -0.1)
            self.zotino0.load()
            delay(0.5*ms)

Even with nothing running on the core device, the bursts of noise are still observed, which points to a hardware problem: zotino_fail1

sbourdeauducq commented 6 years ago

This bug is present on at least 2 channels. I don't have another board to test.

gkasprow commented 6 years ago

Before I manage to recreate the issue, could you please have a look at the supply rails? +/-12V which supplies DAC and buffers.

sbourdeauducq commented 6 years ago

Where do I measure those? The PDF schematics aren't in https://github.com/sinara-hw/Zotino

sbourdeauducq commented 6 years ago

Yep, the power supply is crapping out, here is the -12V: zotino_fail3

gkasprow commented 6 years ago

schematics are in releases OK, so it seems to be the power supply issue. Please check if the power is problematic before the regulators. What is the output load ?

sbourdeauducq commented 6 years ago

What is the output load ?

Open, I only have the scope (1M) connected on one channel.

gkasprow commented 6 years ago

please measure V+ and V- with the scope obraz

gkasprow commented 6 years ago

please also check if the DAC or the LDO above N12V0A is getting hot.

sbourdeauducq commented 6 years ago

The +12V is working correctly. The -12V at the output of the LDO has the problem shown above, but its input is stable. It does get very hot. So I guess we are seeing the LDO thermal shutdown.

sbourdeauducq commented 6 years ago

There is +/-15V at the inputs of the LDOs. Isn't that a bit high? Also, the -12V glitches only begin to occur after the board has been running for a few minutes, which is consistent with thermal shutdown.

sbourdeauducq commented 6 years ago

From a cursory inspection (no thermal camera here yet), nothing else on the board other than the DAC chip gets hot. I wonder if it may be drawing more than its specified current.

sbourdeauducq commented 6 years ago

I measured R4 and it is 36 ohm, so the TDK-Lambda module should produce +/-13V, not the +/-15V I have...

sbourdeauducq commented 6 years ago

Contrary to what I thought, the DAC chip on my Zotino 1.1 is not burned, so I have added a heatsink to it and I am testing that board now. The switching regulator output is still +/-15V.

sbourdeauducq commented 6 years ago

That 1.1 board doesn't have the problem. The -12V LDO is colder, too.

gkasprow commented 6 years ago

The TDK module is not very well stabilized - the output depends on load. The problem with thermal runaway observed on rev 1.0 was similar - the current consumption from negative rail got very high. It happened once I exceeded temperature on the chip above the limit (ARAIR it was 85 degrees). So it's quite possible that you exceeded for a moment the temperature.

gkasprow commented 6 years ago

we recreated same problem in the lab. the Zotino board was placed on the most right side of the crate where cooling is not sufficient.

sbourdeauducq commented 6 years ago

The problem is systematically happening on the v1.2 board, when it is is exposed to air at ambient temperature without hot objects nearby. What should I do, turn on the TEC cooler?

sbourdeauducq commented 6 years ago

Even with the board outside the crate, the bug is present.

gkasprow commented 6 years ago

you can try to remove the Peltier module and attach the heatsink directly.

hartytp commented 6 years ago

@gkasprow you think this is thermal runaway in the dac? That doesn't seem likely to me. On our boards, with no forced air, the dac is barely warm to touch. How hot is it on your board before this issue occurs?

gkasprow commented 6 years ago

This happens when the DAC even for short moment gets its temperature exceeded. It gets damaged, it still works but consumes much more current so it enters this thermal trip quickly. Let's add thermal protection in next revision of Zotino which will switch off the power when temperature is i.e. 55 degrees and indicate it with LED.

hartytp commented 6 years ago

@gkasprow did you test that by heating the dac with a hot air gun?

I just don't get this:

So I think there must be some other cause for all of this. What do you think?

sbourdeauducq commented 6 years ago

Let's add thermal protection in next revision of Zotino which will switch off the power when temperature is i.e. 55 degrees and indicate it with LED.

If that's really the problem, this won't be a solution: the thermal protection will just be activated all of the time in the field. Again, the bug is present on my board outside the chassis, and it's not the hot HK weather, I had AC in the lab.

hartytp commented 6 years ago

If that's really the problem

As I said, I'm pretty sure it's not the problem, or the eval boards wouldn't work. With the thermal island, I could just about believe it, but without the island this doesn't seem plausible IMHO.

@gkasprow how about we settle it like this:

How does that sound?

hartytp commented 6 years ago

Note to self, let's also check that this isn't something dumb like we've used some DAC variant that's only specified over a very narrow temperature range.

gkasprow commented 6 years ago

I also have AC in my lab and the failure happened today. The SMPS delivers max +/- 15V so even if the LDO gets unstable, it is still within safe operation area. I have the DAC devkit so will give it a try.

gkasprow commented 6 years ago

I attached the heat sink directly to the DAC after removing the Peltier, the current consumption is stable and the DAC is cold. The board is on the bench and I have the air flow generated by 12V fan supplied from 10V.

gkasprow commented 6 years ago

I removed the fan and the current consumption started growing. When it had grown by 20mA, I turned the fans on and it felt back to the original value.

jordens commented 6 years ago

It might generally be wise to enable thermal shutdown in the AD5372 by default in ARTIQ.

hartytp commented 6 years ago

I removed the fan and the current consumption started growing. When it had grown by 20mA, I turned the fans on and it felt back to the original value.

Do you know from which rail the current was drawn?

hartytp commented 6 years ago

Were your observations consistent with the data sheet?

untitled

hartytp commented 6 years ago

It might generally be wise to enable thermal shutdown in the AD5372 by default in ARTIQ.

We did consider that in the past (and it might not be a bad idea to do implement it), but the thermal shut-down only kicks in at 130C die temperature. IIRC, @gkasprow felt that by that point the DAC had been destroyed. Also, it's not great protection, as it doesn't help if the user powers the board on for a while before programming. Better to ensure that Zotino doesn't spontaneously die under normal use.

gkasprow commented 6 years ago

I took the ADI devkit, powered it on and heated to 97 degrees. Did not observe any change in current consumption. -12V and +12V was below 10mA (bench PSU current display resolution)

hartytp commented 6 years ago

Good!

So, either the DAC on Zotino has been damaged by something, which is leading to this behaviour. Or, we have some connection on Zotino that's different to the eval board. The question is which one? Maybe try disconnecting all the OpAmps from the DAC outputs?

gkasprow commented 6 years ago

My DAC on Zotino is already broken. What I can do is to measure the output current on the RC filter resistors. I will also check how the +/- 12V rails wake up.

jordens commented 6 years ago

I am pretty sure that the thermal shutdown is designed to prevent and identify (since it doesn't clear itself) destructive temperature excursions. And I'd bet that if the damage is thermal, the shutdown would trip if it were enabled. But yes, I'd also bet that the problem is not bad cooling.

gkasprow commented 6 years ago

the +/- 12V rails wake up in monotonic way. +12 wakes up faster than -12V.

hartytp commented 6 years ago

I am pretty sure that the thermal shutdown is designed to prevent and identify (since it doesn't clear itself) destructive temperature excursions. And I'd bet that if the damage is thermal, the shutdown would trip. But yes, I'd also bet that the problem is not bad cooling.

Exactly.

gkasprow commented 6 years ago

I let the DAC to trip. Then I switched on the fan and the current consumption got lowered to initial value... So it does not seem to be any latchup effect.

gkasprow commented 6 years ago

There is one difference between eval kit and Zotino. On Eval kit the pin 13 is grounded. In datasheet is is marked as NC

gkasprow commented 6 years ago

I connected it to GND but it did not help

hartytp commented 6 years ago

@gkasprow my guess would be that if you replaced the DAC with a fresh one this behaviour would disappear for a while until that DAC died too. If that's the case, the real question is "what's killing the DACs"?

hartytp commented 6 years ago

The SMPS delivers max +/- 15V so even if the LDO gets unstable, it is still within safe operation area.

Well, we could still be supplying too much voltage to the reference or digital supplies.

gkasprow commented 6 years ago

the DAC accepts up to 5V on digital supply. We have 3.3V supply which also supplies the LVDS drivers. And these drivers would be killed first.

gkasprow commented 6 years ago

I'm just curious if this un-connected NC pin could kill the DAC. The reference can survive up to 40V on its input. The DAC draws excessive current from its negative rail.

gkasprow commented 6 years ago

roughly 60mA from -12V and below 10mA from +12V

hartytp commented 6 years ago

Sure, my point is just that characterizing the dead DAC carefully may not tell you anything about what killed it.

gkasprow commented 6 years ago

I have one more Zotino that works correctly. I shorted pin 13 to GND and that's the only think I can do. I can heat it to 90 degrees and see how it behaves.

hartytp commented 6 years ago

I can think of a few things:

gkasprow commented 6 years ago

I posted it here https://ez.analog.com/message/350275-failing-ad5372-dacs