slaclab / pysmurf

Other
2 stars 9 forks source link

SerialGradientDescent failures #722

Open jlashner opened 1 year ago

jlashner commented 1 year ago

Describe the bug

I'm playing around with tuning serial gradient descent, and fairly often it seems to fail with this error:

ERROR:pyrogue.Device.SerialGradientDescent.AMCc.FpgaTopLevel.AppTop.AppCore.SysgenCryo.Base[0].CryoChannels.SerialGradientDescent:int too big to convert
Traceback (most recent call last):
  File "/usr/local/src/rogue/python/pyrogue/_Process.py", line 100, in _run
    self._process()
  File "/tmp/fw/rogue_MicrowaveMuxBpEthGen2_v1.1.0.zip/python/CryoDet/DspCoreLib/CryoDetCmbHcd/_SerialGradientDescent.py", line 97, in _process
    dx     = calcGrad( centerFreqVar, freqErrorVar, etaPhaseVar, freq[channel] + currDf, initialStep, numAverages ) # center difference
  File "/tmp/fw/rogue_MicrowaveMuxBpEthGen2_v1.1.0.zip/python/CryoDet/DspCoreLib/CryoDetCmbHcd/_SerialGradientDescent.py", line 28, in calcGrad
    centerFreqVar.set( centerFrequencyMHz + df )
  File "/usr/local/src/rogue/python/pyrogue/_Variable.py", line 839, in set
    varFuncHelper(self._linkedSet,pargs,self._log,self.path)
  File "/usr/local/src/rogue/python/pyrogue/_Variable.py", line 868, in varFuncHelper
    return func(**args)
  File "/tmp/fw/rogue_MicrowaveMuxBpEthGen2_v1.1.0.zip/python/CryoDet/DspCoreLib/CryoDetCmbHcd/_CryoChannel.py", line 94, in <lambda>
    linkedSet    = lambda var, value, write: var.dependencies[0].set(int(round((value*2**23./freqSpanMHz))), write=write),
  File "/usr/local/src/rogue/python/pyrogue/_Variable.py", line 334, in set
    raise e
  File "/usr/local/src/rogue/python/pyrogue/_Variable.py", line 324, in set
    self._block.set(self, value)
  File "/usr/local/src/rogue/python/pyrogue/_Block.py", line 362, in set
    ba = var._base.toBytes(value)
  File "/usr/local/src/rogue/python/pyrogue/_Model.py", line 128, in toBytes
    ba = value.to_bytes(byteCount(self.bitSize), self.endianness, signed=True)
OverflowError: int too big to convert

This seems like it's probably due to parameters causing the lms operation to runaway which is reasonable, however it only shows up in the rogue output, and that is saturated with warnings like this so it is very easy to miss. Also this isn't handled at all in the serialGradientDescent command, so it leaves the system in a bad state with the amplitude-scale-arrays zeroed out, and the etaScanInProgress variable left at 1, so it won't run anymore eta scan operations.