sinara-hw / Kasli

Kasli is a powerful FPGA carrier, capable of controlling 12 Eurocard extension modules.
Other
16 stars 1 forks source link

FPGA temperature control #44

Closed jordens closed 4 years ago

jordens commented 4 years ago

If a bigger/better heat sink can not keep the die temperature low enough in typical air flow conditions (I don't want to rely on forced airflow for these crates, most people don't seem to bother with that), then we need to go for a small fan and fan controller. Depends on a realistic power budget and thermal sim.

E.g. MAX6639 which is also used in other projects, e.g. DIOT. The DXP/N diode connections on the FPGA die are there.

This would probably be required for Kasli-ZYNQ anyway which makes this the perfect place to test it on.

sbourdeauducq commented 4 years ago

Do we need a fan controller or can the fan be simply left on at all times?

jordens commented 4 years ago

The controller would increase the lifetime of the fan, lower power usage (up to 1W), allow monitoring over I2C, and provide protection against fan failure and overtemperature. Since the thermal shutdown of the FPGA is AFAICT above it's maximum junction temperature (certainly for the commercial temperature range) this seems required.

sbourdeauducq commented 4 years ago

Absolute maximum temperature rating is the same for all industrial/expanded/military grades. So, if we are above the recommended temperature for the commercial grade but below the thermal shutdown threshold, the FPGA may not operate correctly (due to slowing down of the silicon and timing paths failing, likely) but should not be permanently damaged.

sbourdeauducq commented 4 years ago

That being said, if the fan controller has good reset behavior (i.e. by default and without having to use SMBus, the fan is on and the FANFAIL pin is working) then it's fine to use it and connect FANFAIL to e.g. the power supply control, for extra safety. I have little trust in most silicon vendors when it comes to this sort of things (especially since this particular MAX chip comes from the PC market which has a track record of awful standards such as ACPI, EFI and USB3), so this should probably be tested beforehand.

jordens commented 4 years ago

Just connect FANFAIL. No requirement to juggle SMBus. There is no ACPI, EFI or USB3 involved here.

gkasprow commented 4 years ago

we can place LM75-like sensor (in SOT23) close to the FPGA. It does not need any initialization and would shut down the power supply at ~80 degrees. The MAX6649 should also work that way, but I did not test it. We can choose 85,95 or 125 deg version. obraz

sbourdeauducq commented 4 years ago

@gkasprow I think @jordens means the MAX6639, which contains a fan controller.

gkasprow commented 4 years ago

Yes, but I'm not sure it will work without initialization. The MAX6639 needs multi-pole or high-speed fans that generate high-frequency pulses. It generates funny haunting-noise with fans with tachometers slower than 400Hz. Small fans usually meet this condition.

sbourdeauducq commented 4 years ago

Should I order one and test it? IMO, poor reset behavior (i.e. the fan is off before configuration) is pretty bad because:

gkasprow commented 4 years ago

@sbourdeauducq it depends on how you connect the fan. You can use PMOS or NMOS. One of these configurations will turn on fans before config. In Booster we have another one.

sbourdeauducq commented 4 years ago

Oh, I see - yes, we can try something like that, though I think this still needs testing. There's also a proposed solution in the datasheet (figure 9) for the noise.

marmeladapk commented 4 years ago

I found this fansink, which fits our package and does not require mounting holes however it does not have tachometer output.

jordens commented 4 years ago

Looks like the KC705 thing. I am fine with this and either a fan controller or LM75 if you (@marmeladapk and @gkasprow) think its easier, less noisy, and serves the purpose. 30 khours are just 3.5 years but there are very few fans that offer significantly more (e.g. 70kh for some).

hartytp commented 4 years ago

30 khours are just 3.5 years but there are very few fans that offer significantly more (e.g. 70kh for some).

Indeed. The benefit of a fan controller is that it pushes that up by a factor of a few if used in a decent thermal environment...

marmeladapk commented 4 years ago

If we don't have tachometer then I can put just a connection to transistor base: obraz This way you cannot shut it down completely in case of FPGA fault. Radian wrote to me that they couldn't find a fan of this size with tachometer so it's unlikely that I'll find anything.

marmeladapk commented 4 years ago

Do we want MAX6646 to shutdown all supply rails? With a FP red LED to indicate overtemperature. I'm only worried about LVDS inputs from EEMs and GTP inputs damaging the FPGA while it's disabled (these are the only signals that could reach the FPGA). @gkasprow is this something to worry about? Also this would disable 3V3MP on EEMs while keeping 12V on. I will place another MIC5219-3.3 (same as used for FTDI supply) to make P3V3_MP for overtemp LED and MAX6646. It consumes max 400uA, so no worries there.

jordens commented 4 years ago

Won't that circuit lead to weird behavior and gain reversal if the minimum duty cycle is 0 and the temperature is low enough?

GTP inputs should be AC coupled and the SFP modules should be turned off with the FPGA, right? AC signals coming in on direct-attach cables would generally be a problem independent of over temp or fain failure shutdown. I'd be interested to hear what EEM LVDS signals would do if the FPGA is unpowered. This is generally tricky for "remote EEMs" that are powered independently.

But if we don't control the fan to gain lifetime, then for over temperature shutdown, we should just use the XADC and lower the threshold accordingly. The MAX664[679] don't seem useful then. Maybe we can just wire the XADC to control the fan threshold as well (binary on/off, no PWM, large hysteresis to not oscillate, i.e. use ALM[0] with tuned OT upper=100C and e.g. Temp lower=70C and Temp upper=90C). That would avoid the failsafe circuitry (because the 100 C or 125 C shutdown is always there), the fan noise (because it's on/off) and the additional hardware. If we find a big enough heat sink maybe the fan won't go on at all.

What about other boards like the Digilent Zybo Z7? They have smaller packages, screw the fan in between the fins and are using the long life Sunon fans.

The Genesys 2 has a tacho controlled fan for the FFG900 package, maybe we can copy from that.

Also, there should probably be some additional filtering between P12V0 and the fan. Might make sense to connect it to the upstream side of the big power supply EMI filter.

hartytp commented 4 years ago

Not critical, but it would be fun if we could control the fan speed to stabilize the FPGA temperature e.g. to get ultra-stable clock recovery.

marmeladapk commented 4 years ago

@jordens

Won't that circuit lead to weird behavior and gain reversal if the minimum duty cycle is 0 and the temperature is low enough?

That was the plan - if FAN_PWM line is shorted to DC then it won't turn off the fan.

AC signals coming in on direct-attach cables would generally be a problem independent of over temp or fain failure shutdown.

Yes, it's also a problem with external media converters.

What about other boards like the Digilent Zybo Z7? They have smaller packages, screw the fan in between the fins and are using the long life Sunon fans.

Thanks! This one is available from TME, however I'm not sure if we'll be able to screw it in the heatsink. There's also another fansink from this company, perhaps with tacho. However I'm not sure about pricing or availability of those. I'm also not sure if ordering them from Malico and not through a distributor is a good idea. There's also a UK based company which has the same fansink in offer, I sent them an email. This fan however requires 5V supply.

@jordens So if we don't have a tacho then you don't want any additional temp. sensor?

jordens commented 4 years ago

I'm probably not the most experienced person designing FPGA cooling solutions. But if we don't have a tacho and if PWM is too noisy then I'd just go for the XADC doing OT shutdown and fan on/off via ALM[0], i.e. no additional sensor and no fan controller.

marmeladapk commented 4 years ago

So the final version will be: