sinara-hw / Kasli

Kasli is a powerful FPGA carrier, capable of controlling 12 Eurocard extension modules.
Other
16 stars 1 forks source link

I2C SDA 0.8V when pulled low by Si5324 #78

Closed hartytp closed 3 years ago

hartytp commented 4 years ago

https://chat.m-labs.hk/m-labs/pl/9anbfbwbw3yr8j7xctdsx6isxr

hartytp commented 4 years ago

@gkasprow @marmeladapk any ideas?

hartytp commented 4 years ago

(This currently seems to stop some boards booting with ARTIQ).

hartytp commented 4 years ago

Si5324 specifies a max of 0.4V at 3V3 (3mA)

image

hartytp commented 4 years ago

@gkasprow any thoughts about this? I can send you a board showing these symptoms if that will help you debug?

marmeladapk commented 4 years ago

@hartytp I'll take a look at it next week.

hartytp commented 4 years ago

Thanks @marmeladapk !

I have one more Kasli v2.0 at the moment. It also displays this issue. Do you need me to post it to you, or do you have other v2.0 with this problem?

It's possible that there has been some hardware damage to the two boards I have, but it seems unlikely since they both displayed this straight out of the box and I've never seen it before...

marmeladapk commented 4 years ago

I'll first check our Kaslis, I suspect that they may have the similar problem, just not as bad. W dniu czw, 2 lip 2020 o 11∶51 użytkownik hartytp notifications@github.com napisał:

Thanks @marmeladapk !

I have one more Kasli v2.0 at the moment. It also displays this issue. Do you need me to post it to you, or do you have other v2.0 with this problem?

It's possible that there has been some hardware damage to the two boards I have, but it seems unlikely since they both displayed this straight out of the box and I've never seen it before...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

dnadlinger commented 4 years ago

I have one more Kasli v2.0 at the moment.

sb0 now has two of our boards, so I'm not sure how many we have left. (Thinking about it, perhaps you meant to ship only one – apologies.)

gkasprow commented 4 years ago

@hartytp It looks like the pullup current is 6mA or more. If you enable all I2C ports, the pullup currents add up. With 16x10k we get 3.3/625R=5.3mA We also have 2k2 which adds another 1.5mA. So, it looks like all I2C ports are enabled which do not make much sense because I2C addresses overlap.

DonaldKellett commented 4 years ago

Following https://github.com/m-labs/artiq/pull/1480 , the voltage at the SDA pin in the shared I2C bus remains at 0.85V, which causes an "Si5324 failed to ack register" error following a command to write to address 0x68. Please find attached a sigrok trace trace.txt related to this issue. To reproduce the trace, connect wires D0, D1 and GND of your logic analyzer to SDA, SCL (shared I2C bus) and GND respectively on the Kasli 2.0 board.

gkasprow commented 4 years ago

@DonaldKellett are you sure that multiple ports are NOT enabled on the I2C switches?

hartytp commented 4 years ago

@gkasprow how would that cause this? The only pull-up connected to shared SDA is R189 isn't it?

hartytp commented 4 years ago

Never mind, turns out I didn't remotely understand how the I2C switches work...

image

I take your point: if you enable lots of channels at once then you get a much stronger pull-up, which would completely explain why this happens (and why it didn't happen with the CTI tests)...

gkasprow commented 4 years ago

@hartytp there are two types of I2C switches. The ones that let you choose 1 of n outputs (I2C multiplexer). You simply enter the output number to the register and that's done. The ones we use are in fact I2C switches, not muxes.

gkasprow commented 4 years ago

to avoid such issues in the future, we can increase the pullups at the outputs of switches let's say 10-fold.

sbourdeauducq commented 4 years ago

@gkasprow As you can see from the I2C logic analyzer trace (open it with sigrok), just before the Si5324 failure, the firmware writes the 0x00 0x08 control bytes to the two respective switches (note that the LA registers the 0.8V as 0). This is supposed to select only one channel - correct? In https://github.com/m-labs/artiq/pull/1480 we tried repeating the control byte, but no effect.

hartytp commented 4 years ago

@gkasprow if your hypothesis is that the switch is selecting multiple channels at once, can we quickly test that by just shorting a few of the other outputs to ground and seeing if that affects the output? If it doesn’t then that can’t be the issue...

gkasprow commented 4 years ago

the question is what is the state of the registers before the firmware writes the 0x00 and 0x08...

DonaldKellett commented 4 years ago

I've just looked into the I2C_SW test points on my Kasli 2.0 board and the voltage readings of the pins are as follows:

Please find attached the sigrok trace and annotations for these test points. As far as I can tell, this trace is more or less the same than the previous one, but I managed to get some interesting annotations from PulseView by selecting "24xx EEPROM" as the stack decoder on top of the I2C decoder.

To reproduce the trace, connect D0, D1 and GND of the logic analyzer to SDA, SCL (in I2C_SW) and GND of the Kasli 2.0 board respectively.

gkasprow commented 4 years ago

@DonaldKellett this still does not respond to the question I asked. Please check if the other ports are not enabled! Just probe the switch outputs while trying to talk to the Silabs chip.

DonaldKellett commented 4 years ago

I looked into this again, and, from my understanding of the above discussion, I would need to probe the outputs of the SCx/SDx pins on my board to determine whether more than one of them are active at any given time. So, a quick look at the Kasli schematics indicates that these pins are located on IC14, which is a tiny IC at the back of my board and therefore difficult to solder wires directly on it. I then tried looking at the PCB layout on page 20 of the Kasli schematics to look for appropriate soldering points away from the IC but the PDF does not display the PCB layout very well and I couldn't open the PCB_Kasli.PCBDOC file online in the Altium 365 viewer.

@gkasprow do you happen to know what is the best place to solder on another port on this pcb? Update: I think I figured it out.

marmeladapk commented 4 years ago

@DonaldKellett You could probe I2C signals on EEM connectors.

marmeladapk commented 4 years ago

I probed Kasli v2.0 and compared it with v1.1. Measurements:

v2.0, measured on shared bus (so closest to Si5324): tek00020

Low level from Si5324 during reads is around 830 mV (I measured it with scope).


v2.0, measured on 3V3_SW bus (between switches and voltage translator): tek00022

Low level from Si5324 during reads is around 1.1 V.


v2.0, measured on FPGA I2C bus (between FPGA and voltage translator): tek00023

Low level from Si5324 during reads is around 1.1 V.


v1.1, measured on Si5324 bus: tek00025

Low level from Si5324 during reads is around 0 V.


v1.1, measured on 3V3_SW bus (between switches and voltage translator): tek00026

Low level from Si5324 during reads is around 530 mV.


v1.1, measured on FPGA I2C bus (between FPGA and voltage translator): tek00027

Low level from Si5324 during reads is around 0 V.

Place Si5324 bus Between switches and v. translator FPGA bus
v2.0 830 mV 1.1 V 1.1 V
v1.1 0 V 530 mV 0 V
v1.0 120 mV 140 mV 0 V

Artix datasheet specifies low level voltage of 2V5 CMOS to be 0.7 V. So it's only a coincidence that it works on my boards.

Four things changed between v1.1 and v2.0:

  1. voltage translator was changed from bus repeater with voltage translation (TCA9517) to voltage translator (PCA9306)
  2. pullup resistors on FPGA bus and Si5324 bus were changed from 2k2 to 10k
  3. Si5324 now shares bus with 2 GPIO extenders and SFP3
  4. Si5324 doesn't have TCA9517 bus repeater on its bus now

See #46 for rationale of those changes.

Initial 830 mV level surprised me. I know that switches are set only to enable only this bus. So Si5324 has to drive 10k || 2k2 || 10k || 10k to ground which is around 1k3. However in v1.0 it had to drive 2k2 || 2k2 to ground which is a slightly stronger pullup. Either way 3 mA sink should be enough.

In v1.1 Si5324 was "shielded" from rest of the I2C by a repeater. To check if 4. changes anything I'll measure v1.0 later this week, but AFAIR there were no issues with Si5324 there.

Sharing bus with other devices probably doesn't matter. After measuring v1.0 I'll check if changing resistors between switches and v. translator from 2k2 to 10k will help.

jordens commented 4 years ago

Argh.

hartytp commented 4 years ago

@marmeladapk One thing I don't understand from your measurements is that it looks like there is a 270mV voltage drop across the TCA9548ARGER (830mV low-level on SHARED_SDA and 1V1 on I2C_3V3_SW_SDA). The max reistance is specified as being 30Ohm (see below). Unless I'm missing something, that would suggest there is 9mA flowing through it, which would suggest that the pull-up is really only a couple of hundred ohms, wouldn't it, or am I misunderstanding things?

image

jordens commented 4 years ago

And the other strange thing is that the FPGA drives below 0.1 V on v1.1 but only to about 0.3 V on v2.0.

marmeladapk commented 4 years ago

v1.0, Si5324 bus: obraz


v1.0, between switches and repeater: obraz


v1.0, FPGA bus:

obraz

So it seems that lack of repeater (4.) directly before Si5324 is not a problem. Perhaps having two PCA9306 translating voltage to the same bus is problematic? (USB)

I'll continue on Friday.

gkasprow commented 4 years ago

@marmeladapk did you check if the switches have multiple outputs enabled?

hartytp commented 4 years ago

@marmeladapk did you check if the switches have multiple outputs enabled?

@gkasprow I don't think that can be the issue here (although, I agree it's worth sanity checking).

Among other things, my calculation above suggests that 9mA must be flowing through the I2C switch (and, that's based on worst-case assumptions about switch resistance). With 10k pull-ups, that would require 27 channels, which is more than we have....

gkasprow commented 4 years ago

This could be assembly error - somebody assembled resistors that are 10 or 100x lower value.

hartytp commented 4 years ago

Also worth checking. Otherwise, I agree with @marmeladapk that the PCA9306 is the obvious next thing to investigate.

gkasprow commented 4 years ago

PCA9306 adds pure series resistance to the chain. It's FET based level translator, not re-driver.

hartytp commented 4 years ago

To satisfy my curiosity, do you have a rough equivalent circuit for how it works?

gkasprow commented 4 years ago

It's just a FET with its gate connected to EN pin and substrate connected to GND. The resistance depends on the supply voltage. So, it does not regenerate the signal. It only passes the logic low level and clips the logic high level to the lower supply potential.

sbourdeauducq commented 4 years ago

Not sure if this is related to the original problem, but @hartytp's board that @DonaldKellett was using has developed what looks like a hardware failure with I2C_SW SCL and SDA now permanently stuck low.

hartytp commented 4 years ago

Is that just on the FPGA side (i.e. failure of IC13), or is I2C_3V3_SW stuck low as well?

jordens commented 4 years ago

Elaborating on https://github.com/sinara-hw/Kasli/issues/78#issuecomment-655572273, the > 300 mV when the FPGA is driving low (see "v2.0, measured on FPGA I2C bus (between FPGA and voltage translator):") mean that it's sinking more than 18 mA (12 mA slow drive from IBIS).

hartytp commented 4 years ago

It's somewhat hard to see where all that current could be coming from. Broken level translator? FTDI chip?

marmeladapk commented 4 years ago

Ok, I found the culprit. As I thought, USB I2C bus is the problem. FTDI seems to drive all its pins actively high after reset. This means that SDA SCL and enable pin of voltage translator (which has pulldown) are driven high. Since it's driven actively it sources current when other devices try to drive the bus low. It wasn't a problem before because previously we used active repeater which didn't have a direct connection to shared bus.

Shared bus has 0.7 V when USB is plugged in and power is disabled.

The 4 channels of the FT4232H reset to 4 asynchronous serial UART interfaces.

Solutions for now:

Long-term solutions:

hartytp commented 4 years ago

Insert inverter on enable pin, so that by default IC22 is disabled (preferred by me)

I don't have the layout in front of me, but is that something we can easily hack into the existing boards with a scalpel and some glue?

jordens commented 4 years ago

We're already programming the FTDI EEPROM. I bet there is an option to solve this properly. https://github.com/quartiq/kasli-i2c/blob/master/kasli-ft4232h.conf.in Try adding suspend_pull_downs=true.

jordens commented 4 years ago

I don't have the layout in front of me, but is that something we can easily hack into the existing boards with a scalpel and some glue?

Why not just look into programming the EEPROM properly?

hartytp commented 4 years ago

Why not just look into programming the EEPROM properly?

If that works reliably then it's fine by me.

marmeladapk commented 4 years ago

@jordens It may not work if you're connected to the terminal (so FTDI is not USB is not in suspend mode). But I'll check.

sbourdeauducq commented 4 years ago

I still think the enable pin should be inverted in v2.1 to make the hardware friendlier.

jordens commented 4 years ago

@jordens It may not work if you're connected to the terminal (so FTDI is not USB is not in suspend mode).

I meant look for the actual option that sets the state of the FTDI interfaces that are not being used. Note that the four ports are four different and independent USB interfaces. Connecting to one doesn't mean much for the others.

hartytp commented 4 years ago

@jordens I hadn't seen https://github.com/quartiq/kasli-i2c/blob/master/kasli-ft4232h.conf.in before (I've been using ftprog). That's a nice util.

jordens commented 4 years ago

If that works reliably then it's fine by me.

I can't see how properly setting up the EEPROM could be nearly as unreliable as "a scalpel and some glue".

jordens commented 4 years ago

I still think the enable pin should be inverted in v2.1 to make the hardware friendlier.

Because of the implied incompatibility this is a bad and shortsighted idea. Why not just configure it properly?

sbourdeauducq commented 4 years ago

Removing IC22 is easier than programming FTDI chips, especially since FTDI chips are prone to all sort of bugs (example: https://twitter.com/marcan42/status/695292366639378433).