newaetech / ChipSHOUTER-python

Python API for ChipSHOUTER
MIT License
5 stars 6 forks source link

How to use trigger_safe properly ? #8

Open doegox opened 4 years ago

doegox commented 4 years ago

I'm a bit puzzled... Using an external hardware trigger.

The user manual recommends to call get triggersafe, else the device will enter fault mode if safety checks cannot be performed for a certain length of time. The Python API tells that reading trigger_safe will allow triggering for 5 seconds without checking the temperature.

The user manual presents absent_temp as the configurable time (1..60, unit unknown, seconds?) the temperature sensors can be skipped being read. The Python API specifies unit is seconds but doesn't specify the allowed interval.

Now, from my tests:

I thought I could use absent_temp to configure the maximum delay between regular reads of trigger_safe but it's not the case ? What is absent_temp for then ? And why calling trigger_safe reduces drastically the time between entering fault mode compared with not calling trigger_safe ?

colinoflynn commented 4 years ago

Not a full answer as will have to confirm some stuff in firmware...

But the safety checks do a bunch of stuff, including probe continuity, high voltage status, temperature, etc. The absent_temp only allows you to push out the temp checks, as physically the temp doesn't change as fast. Things like probe interlock have much faster reactions.

Will have to look in more detail at where timeout for trigger_safe comes from.

The overall flow: device continually performs safety checks on a fairly fast internal (few hundred mS, IIRC). But the checks are invalid if too close to a discharge event. When triggering rapidly it's very likely it never gets chance to run the test.

Trigger_safe is a way to tell device when it can safely run a check. The idea is to call frequently.

The device could be faulting because it didn't realize a discharge occurred & used the results of a safety check. The 5s timeout is if device was unable to run a test, but if it thinks a test failed it should fault immediately.

Will have to confirm this in hw later!

BTW what fault is it throwing?

On February 28, 2020 23:10:51 Philippe Teuwen notifications@github.com wrote:

I'm a bit puzzled... Using an external hardware trigger.

The user manual recommends to call get triggersafe, else the device will enter fault mode if safety checks cannot be performed for a certain length of time. The Python API tells that reading trigger_safe will allow triggering for 5 seconds without checking the temperature.

The user manual presents absent_temp as the configurable time (1..60, unit unknown, seconds?) the temperature sensors can be skipped being read. The Python API specifies unit is seconds but doesn't specify the allowed interval.

Now, from my tests:

if I don't read trigger_safe at all, the CS will enter fault mode only after quite a while. if I read trigger_safe before repeated triggers, CS will enter fault mode systematically 5s later (as documented in the Python API), no matter the value of absent_temp.

I thought I could use absent_temp to configure the maximum delay between regular reads of trigger_safe but it's not the case ? What is absent_temp for then ? And why calling trigger_safe reduces drastically the time between entering fault mode compared with not calling trigger_safe ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

doegox commented 4 years ago

It seems actually I get a fault_temp_sensor very often, right after calling cs.trigger_safe. If I power cycle the CS before a series of tests, it runs for just 2-5s then I get fault_temp_sensor errors. Changing absent_temp to 1 or 60 doesn't change this behavior.

If I skip calls to cs.trigger_safe, it runs fine for quite a while (hours)

h0rac commented 4 years ago

I have similar problem, but not after 5s.. after 15-60 min of run

ChipSHOUTER xformer: 51, MOSFET: 44, diode: 35  - temperature 
ChipSHOUTER voltage measured: 457, set: 470
ChipSHOUTER absent time after trigger: 1
ChipSHOUTER glitched 73 times, runs: 00:17:25
Faults current:  [], latched:  ['fault_temp_sensor'] ['fault_temp_sensor']

Also after that when you read temp values they are all -1 or 65535 Only device reset clear them

ChipSHOUTER xformer: 65535, MOSFET: 65535, diode: 65535  - temperature 

This is a bug I think. This values should never be overshoot like that. When reading this value it goes in fault state. It is also possible to Arm with this failure values. If trigger_safeis not set you can still arm.. with this bulshit values :)

Then when you enable trigger_safe and ARM with xformer: 65535, MOSFET: 65535, diode: 65535 ChipSHOUTER goes to Fault state. This should prove that this is software bug

Of course you can see this -1 values in normal UART menu, so it is not python API problem

alex-dewar commented 3 years ago

Finally had a chance to look at this. It seems the logic for trigger_safe was backwards - normally temp faults were ignored and were only actually latched when trigger_safe was read. New FW builds should have this fixed. Also, the temp sensor is now not read when trigger_safe is read, which should fix https://github.com/newaetech/ChipSHOUTER/issues/7 as well.

The timeout is currently fixed to 5s (at least that's what the code says, not sure if that actual timer works out to that), but could be made adjustable pretty easily I think

h0rac commented 3 years ago

Hi thanks for answer, what is expected day of releasing new FW build ?

doegox commented 3 years ago

thanks for the fix! oh so now we can reflash the chipshouter? I didn't see any upgrade instructions, did I miss sth?

h0rac commented 3 years ago

thanks for the fix! oh so now we can reflash the chipshouter? I didn't see any upgrade instructions, did I miss sth?

From what I know they sign firmware, and I agree no procedure is provided how to update

alex-dewar commented 3 years ago

Fw is built on a per board basis currently as @h0rac said. We can do a build for you pretty quickly, though be warned that the new firmware doesn't allow ignoring trigger_safe like old one did as you'll very quickly run into temperature faults. I'll need your board ID to build the new firmware, which we can do via DMs on the forums or Discord.

The update process is at https://forum.newae.com/t/chipshouter-error-33u51a/2135. I'll be working on an easier/more convenient firmware flash method next, so future updates should be less of a hassle.

h0rac commented 3 years ago

I will PM you on forum, with my and and Major Malfunction CS Serial number

alex-dewar commented 3 years ago

Quick update on this: I changed the behaviour so it follows what's in the CS manual:

Temp sensor reads are periodic (as is the case now). If trigger_safe is called:

The temp sensor must be read periodically, set by absent_temp, otherwise the CS will fault. Additionally, if the CS is disarmed, it will begin reading the temperature periodically again.

h0rac commented 3 years ago

@alex-dewar what do you have example, because I still have issue:

cs.absent_temp = 5

and then in loop before HW trigger (because I use own FPGA as pulse generator) I check trigger.safe and allow to trigger only when it is True... after few glitch attempts it goes to fault state

When I do not use trigger_safe I have @doegox described issue (CS go fault) ;/

safe_trigger doesn't seems to work correctly. When I extend absent_temp time to 60 it fails after longer time.. but looks like CS is not checking correctly temp_sensor and raise a fault for that

CS is unusable right now ;/

alex-dewar commented 3 years ago

How often are you checking trigger_safe? You need to call it repeatedly.

I'll be able to build new firmware tomorrow morning.

prestegaard commented 3 years ago

Hi, new FW updates which makes the CS more robust against crashing would be great! I made a workaround in Python which is one ay of tackling this until the new FWs are out. This work great for me. I call this before each HW trigger input.

This code basically skips the cs.trigger_safe issues if the CS is already armed.

def waitUntilCSReady(cs, logger):
    attempts = 0
    state = ''
    try:
        state = cs.state
    except:
        logger.error("Could not read state info from ChipShouter")
    if state == 'armed':
        return
    while not(cs.trigger_safe):
        try:
            if cs.state == 'armed':
                break
            else:
                cs.clr_armed = True
        except:
            logger.error("Attempt to clear faults and arm Chipshouter failed")
            ChipShouterInit(cs, logger)
        sleep(0.3)

        attempts += 1
        faults = cs.faults_current + cs.faults_latched
        logger.debug("Faults in ChipShouter: ")
        logger.debug(faults)

        sleep(1)
        if attempts > 3:
            ChipShouterInit(cs, logger)
            cs.clr_armed = True
            continue

    if not cs.trigger_safe:
        logger.error("Cannot ARM Chipshouter")
        exit()
h0rac commented 3 years ago

How often are you checking trigger_safe? You need to call it repeatedly.

I'll be able to build new firmware tomorrow morning.

It's in loop before each hw trigger signal send when pulse generator is armed

alex-dewar commented 3 years ago

How often are you checking trigger_safe? You need to call it repeatedly. I'll be able to build new firmware tomorrow morning.

It's in loop before each hw trigger signal send when pulse generator is armed

Alright, I believe I fixed the issue, which was caused by a timeout not being reset when calling trigger_safe. I've sent you an old build on the forums. Let me know if you'd like to give this fixed version a shot as well.

Hi, new FW updates which makes the CS more robust against crashing would be great! I made a workaround in Python which is one ay of tackling this until the new FWs are out. This work great for me. I call this before each HW trigger input.

Thanks for the feedback! Honestly, I think your best bet on the old firmware might be to completely ignore trigger_safe as I don't think there's really a way to properly use it.

h0rac commented 3 years ago

Latest fix seems to work correctly. Thanks for quick reaction