sinara-hw / Kasli

Kasli is a powerful FPGA carrier, capable of controlling 12 Eurocard extension modules.
Other
16 stars 1 forks source link

hot plug warning #37

Closed hartytp closed 4 years ago

hartytp commented 5 years ago

Write "DO NOT HOTPLUG" next to the EEM connectors schematic/pcb

jordens commented 5 years ago

Is this really the right thing to do? Hotplugging them is a bad idea. But so are many other things that grad students tend to do and have done in the past, e.g. ESD, wrong power supply voltage, putting the board on a conductive surface, pulling/inserting connectors with no attention to minimize board bending, obstructing air flow. IMO the proper handling needs to be described a manual and not encoded as a list of things that you are not supposed to do printed on the schematic and the PCB. Anyway: from reading the schematic it is brutally clear that you can't hot plug the EEM connectors.

marmeladapk commented 5 years ago

It won't hurt and, as I've seen, it's one of the most common mistakes when leaving someone new alone with Kasli for a minute.

jordens commented 5 years ago

I think it does in fact hurt because it gives a false sense of safety and an excuse. It's incomplete. But if you insist. Please don't overdo it in the font size, all-caps and number of exclamation marks. If the text doesn't get the message across all-caps won't help. Keep it at the same size as the "EEM0" markings. From my observation handling devices wearing wool/plastic sweaters, rubber shoes, dragging plastic chairs without ESD protection is significantly more common. And since the damage caused by ESD is typically not immediately visible and associated with improper handling it is costlier in the long run and much harder to learn.

dhslichter commented 5 years ago

I think it doesn't hurt to add it on the schematic and PCB. However I agree with @jordens that what is really needed is a manual detailing these sorts of things. I will say that the schematic may be brutally clear to those who know how to read it (or who even bother to), but the most likely case is a new grad student who doesn't fully understand the electronics or ramifications and probably didn't even look at the schematic. A manual with a few key bullet points on the front page would go a long way towards addressing this (large, never-going-to-go-away) group of users.

gkasprow commented 5 years ago

True, It was the first thing one of my student did with Kasli :D

gkasprow commented 5 years ago

That's funny. I commented 3 mins ago and that comment was placed before your comments which were placed 5h ago...

dhslichter commented 5 years ago

@gkasprow Somebody's clock is off, we are getting comments from the future!

image

jbqubit commented 5 years ago

Within the Sinara platform several interfaces are hot pluggable including Ethernet, SFP, uTCA AMC, uTCA RTM, USB. So it's not unreasonable for users to assume that EEM is too.

Agreed that important precautions should be written down. Users ought not have to refer to the PCB schematics to be advised on best practices.

An interim solution is the wiki which I've just updated.

jordens commented 5 years ago

Flawed reasoning.

There is no reason to extrapolate from the connectors you mention. Why would there be? The EEM connector is actually an internal connector that is not (easily) accessible when in a crate. It is significantly different.

If you survive poking those connectors with a screw driver naively extrapolating to a wall outlet can be deadly. If you have crossed a road 100 times without being hit by a car that doesn't mean there are no cars on roads. If you have survived being stung/bitten by a mosquito, a bee, a wasp, and a rose that doesn't allow extrapolation to other animals or plants. Extrapolation is unsound without detailed reasoning and always risky.

dhslichter commented 5 years ago

@jordens I think the point is that many first-time users may be idiots with flawed reasoning :) Basically, we want the documentation to warn users of any issues where the product of (danger to board if error is made) x (likelihood a new user might try this if not warned) is "large", in whatever sense you like.

jordens commented 5 years ago

If many first time users are idiots with flawed reasoning and if this is really considered a likely behavior that needs to be addressed this way, then I don't understand why nobody wanted to put similar warnings on the mezzanine connectors on Sayma AMC, RTM and Metlino (and probably the VHDCI connectors as well since shield connection is not reliable during mating), on the TEC connector on Zotino, or on the connectors on Stabilizer and Humpback. Now you have a system that warns in one place but omits warnings in others. That's IMO worse than before as it stimulates the risky and flawed reasoning it's supposed to prevent.

dhslichter commented 5 years ago

The solution to this is to put warnings in all necessary places, not to put them nowhere in the name of consistency. I agree that it would be good to be consistent with all of this if we can. "Never hot plug Sinara peripherals to Sinara boards" might be a reasonable blanket statement to make? Better to be overkill on the safe side than the other way around I think.

jordens commented 5 years ago

This is getting a bit petty, but I am worried by three things which apply to other areas as well.

hartytp commented 5 years ago

@jordens I don't feel strongly about this issue, but a couple of comments:

jordens commented 5 years ago

@hartytp Please note that the idiot label does not come from me. It's not my view and contradicts my assumptions. I would like to assume the opposite and would rather give the users the info they need to understand the problem than add confusion with incomplete and inconsistent warnings. Why does it matter whether connector mating is a common operation? If the risk is the same then the same rules apply. The argument that "it doesn't hurt" is a pretty low bar and doesn't carry much weight. The DIO issue seems fundamentally different. It's handled consistently and arose from a design flaw that was not meant to be there. Both don't don't apply here. We don't know whether it is even worth the board space. In practice (and at least for myself) it would not have made a difference at all. I have hot-plugged them as well. Observing myself and going into the details of the thought process when handling: Hotplugging occurs here when you are convinced you know what you are doing because you have done it several times before but you forget to check whether the prerequisites are met (the power supply is off). In that situation you would have never processed the warning on Kasli since you have already seen it several times. You already know about the risks and you are fully aware that electronics is sensitive. After it happens you immediately know what went wrong. Trying to get into that thought process at that instant and break it with a warning on the board is utterly pointless from a ergonomics perspective. Was the student you are referring able to correctly power down the boards before? Then maybe the cause was a similar lapse and not "lack of warnings"? With "most common" you are referring to once (plus the one time it happened in Greg's lab AFAIK). We've hade ESD damage more than twice before and don't bother putting a ESD warning pictogram on the PCBs. This warning is a non sequitur and disproportionate to me.

dtcallcock commented 5 years ago

Perhaps we can ship every board with a pack of stickers so people can customise according to taste/philosophy. As well as boring safety warnings, some of them could have bad puns or cool graphics on them.

Talking of which, we only have stickers with the old 'ice skaters' ARTIQ logo on. Can we get a sticker upgrade @sbourdeauducq ?

dhslichter commented 5 years ago

My use of the phrase "idiots with flawed reasoning" was followed by a smiley face in an attempt to indicate a certain level of satirical excess -- I don't think Kasli users are actual idiots, but was trying to point out that most will not have thought deeply about these sorts of issues prior to trying to get a system up and running for the first time.

I think that a blanket statement of "Never hot plug peripherals to Sinara boards", placed next to each such connector on all boards (stickers are fine if we already have fabbed boards, and can be applied or not depending on the taste of the user), would be nice. Of course one is always going to have the failure mode of forgetting to turn off the power supply before connecting, and we can't protect against that. But we can use writing on the boards themselves (whenever possible) just to provide a constant reminder for this failure mode, which seems to occur relatively often, and is an expensive error to make. It won't prevent all issues but might help prevent some.

@dtcallcock the ARTIQ logo is open source, make your own stickers ;)

jbqubit commented 5 years ago

Physical students in particular don't typically have an electronics background. They're not typically thinking critically about what's on PCBs when putting a system together for the first time. If we were designing a system with high-cost or consequences of failure (eg airplane, medical device) this level of analysis about consequences of imperfect warning-sign coverage might be appropriate. However, Sinara is a system for pragmatic people who are trying to string together complex lab setups often in teaching environments. I like @hartytp 's perspective here.

a consistent policy here is that we can't warn against everything silly people can do, but picking on the ones that are most common and most destructive in the typical lab usage scenarios we've encountered so far doesn't seem daft

"People Aren’t Dumb. The World Is Hard." -Richard Thaler

http://freakonomics.com/podcast/richard-thaler/ https://science.sciencemag.org/content/339/6124/1152

gkasprow commented 5 years ago

What about making the next revision of EEMs hot-plug survivable? This needs adding the MOSFET power switch on the 12V rail on either Kasli or each EEM extension. The main power would be applied after a few hundreds of ms after the EEM connector is plugged. We can use one of the GND lines as a sensing circuit. With AC decoupling this would not break the SI of LVDS line. From the cost perspective, we need to add 12 MOSFETS, 0.2$/pc + some discrete RC components.

gkasprow commented 5 years ago

Yet another idea is to link such hot-insertion with power-cycling of the FPGA.

dtcallcock commented 5 years ago

I think this is a great idea if it's that simple. Would 100ms delay work though? I don't have a connector to hand but I feel like if you get it wrong and go in at a slight angle you might be able to make about half the connections and then get stuck.

gkasprow commented 5 years ago

We can detect if the EEM is fully inserted and then deliver the power.

hartytp commented 5 years ago

I’m wary of trying to make eems hot-pluggable. The worst-case scenario is that we bill them as hot-playable but there are still corner-cases where hot plugging breaks boards. E.g someone is hot plugging a unit in a rack where the cable is at an awkward angle and they offset the connector by a few pins. Now the 3V3 shorts to ground and the unit starts and breaks. What about eems with two connectors, so that when the second is plugged in it’s 12V is already live? What about hot plugging when there is still charge left on capacitors (again if the cable is accidentally offset by a few pins).

Hot plugging isn’t that useful and hard to make really robust, it just encourages bad practice. Much better to just document that eems/afes/etc should never be hot plugged.

gkasprow commented 5 years ago

@hartytp this is a protection mechanism only. They won't be hot-pluggable. We won't even write a word about such a feature. This is only to add some robustness, especially if we go for the backplane. With 2-EEM boards, when one is already inserted, the thread is minimal since GND and power rails are already there. The main risk is when somebody plugs 12V first, then LVDS and then GND. This breaks the FPGA.

hartytp commented 5 years ago

@gkasprow if the connector is partially inserted but displaced by a few pins then I believe it’s possible to break the fpga as well (iirc one can end up connecting lvds to 12V or something like that).

Personally I wouldn’t bother adding complexity for a feature that will still leave many damage mechanisms open.

dhslichter commented 5 years ago

Aren't the EEM connectors shrouded to help prevent pin misalignments? I don't like the idea of trying to make EEMs truly hot-pluggable, but putting in some protection for common failure modes (like this MOSFET idea) seems like a reasonable thing to do as long as it doesn't degrade performance or cost a lot.

gkasprow commented 5 years ago

they are shrouded but can be partially inserted causing missing contacts for some pins.

hartytp commented 4 years ago

I'm not fussed about this and it seems controversial so closing.

marmeladapk commented 4 years ago

It's already in schematics and I'll place text under EEM connectors.