Open doegox opened 4 years ago
Not a full answer as will have to confirm some stuff in firmware...
But the safety checks do a bunch of stuff, including probe continuity, high voltage status, temperature, etc. The absent_temp only allows you to push out the temp checks, as physically the temp doesn't change as fast. Things like probe interlock have much faster reactions.
Will have to look in more detail at where timeout for trigger_safe comes from.
The overall flow: device continually performs safety checks on a fairly fast internal (few hundred mS, IIRC). But the checks are invalid if too close to a discharge event. When triggering rapidly it's very likely it never gets chance to run the test.
Trigger_safe is a way to tell device when it can safely run a check. The idea is to call frequently.
The device could be faulting because it didn't realize a discharge occurred & used the results of a safety check. The 5s timeout is if device was unable to run a test, but if it thinks a test failed it should fault immediately.
Will have to confirm this in hw later!
BTW what fault is it throwing?
On February 28, 2020 23:10:51 Philippe Teuwen notifications@github.com wrote:
I'm a bit puzzled... Using an external hardware trigger.
The user manual recommends to call get triggersafe, else the device will enter fault mode if safety checks cannot be performed for a certain length of time. The Python API tells that reading trigger_safe will allow triggering for 5 seconds without checking the temperature.
The user manual presents absent_temp as the configurable time (1..60, unit unknown, seconds?) the temperature sensors can be skipped being read. The Python API specifies unit is seconds but doesn't specify the allowed interval.
Now, from my tests:
if I don't read trigger_safe at all, the CS will enter fault mode only after quite a while. if I read trigger_safe before repeated triggers, CS will enter fault mode systematically 5s later (as documented in the Python API), no matter the value of absent_temp.
I thought I could use absent_temp to configure the maximum delay between regular reads of trigger_safe but it's not the case ? What is absent_temp for then ? And why calling trigger_safe reduces drastically the time between entering fault mode compared with not calling trigger_safe ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
It seems actually I get a fault_temp_sensor
very often, right after calling cs.trigger_safe
.
If I power cycle the CS before a series of tests, it runs for just 2-5s then I get fault_temp_sensor
errors.
Changing absent_temp
to 1 or 60 doesn't change this behavior.
If I skip calls to cs.trigger_safe
, it runs fine for quite a while (hours)
I have similar problem, but not after 5s.. after 15-60 min of run
ChipSHOUTER xformer: 51, MOSFET: 44, diode: 35 - temperature
ChipSHOUTER voltage measured: 457, set: 470
ChipSHOUTER absent time after trigger: 1
ChipSHOUTER glitched 73 times, runs: 00:17:25
Faults current: [], latched: ['fault_temp_sensor'] ['fault_temp_sensor']
Also after that when you read temp values they are all -1 or 65535 Only device reset clear them
ChipSHOUTER xformer: 65535, MOSFET: 65535, diode: 65535 - temperature
This is a bug I think. This values should never be overshoot like that. When reading this value it goes in fault state. It is also possible to Arm with this failure values. If trigger_safe
is not set you can still arm.. with this bulshit values :)
Then when you enable trigger_safe
and ARM with xformer: 65535, MOSFET: 65535, diode: 65535 ChipSHOUTER goes to Fault state. This should prove that this is software bug
Of course you can see this -1 values in normal UART menu, so it is not python API problem
Finally had a chance to look at this. It seems the logic for trigger_safe was backwards - normally temp faults were ignored and were only actually latched when trigger_safe was read. New FW builds should have this fixed. Also, the temp sensor is now not read when trigger_safe is read, which should fix https://github.com/newaetech/ChipSHOUTER/issues/7 as well.
The timeout is currently fixed to 5s (at least that's what the code says, not sure if that actual timer works out to that), but could be made adjustable pretty easily I think
Hi thanks for answer, what is expected day of releasing new FW build ?
thanks for the fix! oh so now we can reflash the chipshouter? I didn't see any upgrade instructions, did I miss sth?
thanks for the fix! oh so now we can reflash the chipshouter? I didn't see any upgrade instructions, did I miss sth?
From what I know they sign firmware, and I agree no procedure is provided how to update
Fw is built on a per board basis currently as @h0rac said. We can do a build for you pretty quickly, though be warned that the new firmware doesn't allow ignoring trigger_safe like old one did as you'll very quickly run into temperature faults. I'll need your board ID to build the new firmware, which we can do via DMs on the forums or Discord.
The update process is at https://forum.newae.com/t/chipshouter-error-33u51a/2135. I'll be working on an easier/more convenient firmware flash method next, so future updates should be less of a hassle.
I will PM you on forum, with my and and Major Malfunction CS Serial number
Quick update on this: I changed the behaviour so it follows what's in the CS manual:
Temp sensor reads are periodic (as is the case now). If trigger_safe is called:
The temp sensor must be read periodically, set by absent_temp, otherwise the CS will fault. Additionally, if the CS is disarmed, it will begin reading the temperature periodically again.
@alex-dewar what do you have example, because I still have issue:
cs.absent_temp = 5
and then in loop before HW trigger (because I use own FPGA as pulse generator) I check trigger.safe and allow to trigger only when it is True... after few glitch attempts it goes to fault state
When I do not use trigger_safe I have @doegox described issue (CS go fault) ;/
safe_trigger doesn't seems to work correctly. When I extend absent_temp time to 60 it fails after longer time.. but looks like CS is not checking correctly temp_sensor and raise a fault for that
CS is unusable right now ;/
How often are you checking trigger_safe? You need to call it repeatedly.
I'll be able to build new firmware tomorrow morning.
Hi, new FW updates which makes the CS more robust against crashing would be great! I made a workaround in Python which is one ay of tackling this until the new FWs are out. This work great for me. I call this before each HW trigger input.
This code basically skips the cs.trigger_safe issues if the CS is already armed.
def waitUntilCSReady(cs, logger):
attempts = 0
state = ''
try:
state = cs.state
except:
logger.error("Could not read state info from ChipShouter")
if state == 'armed':
return
while not(cs.trigger_safe):
try:
if cs.state == 'armed':
break
else:
cs.clr_armed = True
except:
logger.error("Attempt to clear faults and arm Chipshouter failed")
ChipShouterInit(cs, logger)
sleep(0.3)
attempts += 1
faults = cs.faults_current + cs.faults_latched
logger.debug("Faults in ChipShouter: ")
logger.debug(faults)
sleep(1)
if attempts > 3:
ChipShouterInit(cs, logger)
cs.clr_armed = True
continue
if not cs.trigger_safe:
logger.error("Cannot ARM Chipshouter")
exit()
How often are you checking trigger_safe? You need to call it repeatedly.
I'll be able to build new firmware tomorrow morning.
It's in loop before each hw trigger signal send when pulse generator is armed
How often are you checking trigger_safe? You need to call it repeatedly. I'll be able to build new firmware tomorrow morning.
It's in loop before each hw trigger signal send when pulse generator is armed
Alright, I believe I fixed the issue, which was caused by a timeout not being reset when calling trigger_safe. I've sent you an old build on the forums. Let me know if you'd like to give this fixed version a shot as well.
Hi, new FW updates which makes the CS more robust against crashing would be great! I made a workaround in Python which is one ay of tackling this until the new FWs are out. This work great for me. I call this before each HW trigger input.
Thanks for the feedback! Honestly, I think your best bet on the old firmware might be to completely ignore trigger_safe as I don't think there's really a way to properly use it.
Latest fix seems to work correctly. Thanks for quick reaction
I'm a bit puzzled... Using an external hardware trigger.
The user manual recommends to call
get triggersafe
, else the device will enter fault mode if safety checks cannot be performed for a certain length of time. The Python API tells that readingtrigger_safe
will allow triggering for 5 seconds without checking the temperature.The user manual presents
absent_temp
as the configurable time (1..60, unit unknown, seconds?) the temperature sensors can be skipped being read. The Python API specifies unit is seconds but doesn't specify the allowed interval.Now, from my tests:
trigger_safe
at all, the CS will enter fault mode only after quite a while.trigger_safe
before repeated triggers, CS will enter fault mode systematically 5s later (as documented in the Python API), no matter the value ofabsent_temp
.I thought I could use
absent_temp
to configure the maximum delay between regular reads oftrigger_safe
but it's not the case ? What isabsent_temp
for then ? And why callingtrigger_safe
reduces drastically the time between entering fault mode compared with not callingtrigger_safe
?