legend-exp / pygama

Python package for data processing and analysis
https://pygama.readthedocs.io
GNU General Public License v3.0
18 stars 56 forks source link

FlashCam (via ORCA) timestamps decoded incorrectly #370

Closed jasondet closed 1 year ago

jasondet commented 2 years ago

We need to fix how FC timestamps are decoded. Bits are being lost.

jasondet commented 2 years ago

Also need to get file keys / runtime / and timestamps in same time zone (UTC)

gipert commented 1 year ago

@jasondet you said that the bug seems to be in fcutils as well?

CC @ssailer

ssailer commented 1 year ago

Hi!

@jasondet Could you clarify what you meant by "bits are being lost"?

Unfortunately I did not follow the timestamp problem discussion in depth - some more clarification might be needed on my part. In any case, I did look at all relevant code parts and can give some feedback which might help clear up some problems.

Here are the 4 relevant code sections: Orca FlashCamADC Decoder pygama orca_flashcam_decoder pyfcutils fcio_decoder

readoutserver code: file fc250b-3.4/server/Libs-daq/fc250b/fc250b.c, function getstatustime, line 1180ff I will quote the code here. The array f->gpstime is written to the fcio stream named fcio_event->timeoffset, the offsets into the array are the same.

////////////////////////////////////////////////////////////////
static int getstatustime(FC250b *f)
////////////////////////////////////////////////////////////////
// fetch the status and compare unix times with status time 
// done with the read out master module only 
////////////////////////////////////////////////////////////////
{   
   const char *s="FC250b/getstatustime"; 
   struct timeval t1,t2;
   Efbp x=f->efbpid;
   int i=0,retry=100; 
   double deltatime=0;
   for(i=0; i<retry; i++) 
   {
     gettimeofday(&t1,0);
     int rc=EfbpCommand0(x,f->readoutmasteradr,FC250bStatusCommand,sizeof(FC250bStatusData),f->readoutmasterstatus); 
     if(rc<(int)sizeof(FC250bStatusData)) 
     {
       if(debug) fprintf(stderr,"%s/ERROR: can not get status time from read out master adr 0x%x\n",s,f->readoutmasteradr);
       f->readoutmasterstatus->status=0;
       f->errors++; 
       return 0;  
     }
     gettimeofday(&t2,0);
     deltatime=(t2.tv_sec-t1.tv_sec)+(t2.tv_usec-t1.tv_usec)*1e-6; 
     if(debug >4) fprintf(stderr,"%s: unix delta time %g \n",s,deltatime);
     if(deltatime<0.050) break; 
   }   
   unsigned int maxticks=f->readoutmasterstatus->maxticks+1; // ticks/maxticks must be a ratio in the half-open interval [0,1); 0 is the next second/pps
   unsigned int pps=f->readoutmasterstatus->pps;
   unsigned int ticks=f->readoutmasterstatus->ticks;
   f->statustime[0]=pps;
   f->statustime[1]=1e6*ticks/maxticks;
   f->statustime[2]=t2.tv_sec;
   f->statustime[3]=t2.tv_usec;
   int unixsec=(t2.tv_sec);
   int unixusec=(t2.tv_usec);
   int ticksusec=(1e6*ticks)/maxticks; 
   double timeoffset=unixsec-pps+(unixusec-ticksusec)/1e6;
   f->gpstime[0]=floor(timeoffset);     // sec offset 
   double delta=(timeoffset-floor(timeoffset))*1e6;
   f->gpstime[1]=delta;                 // usec offset
   int gpsoffset=floor(timeoffset+0.25); 
   f->gpstime[2]=gpsoffset;             
   int gpslock=(delta>500000)?delta-1000000:delta;  
   f->gpstime[3]=gpslock;               // lock in usec
   int gpsdiff=abs(gpslock); 
   f->gpstime[4]=gpsdiff;               

   if(debug>3) fprintf(stderr,"%s: retries %d gps unix time %d.%06d master time %d.%06d offset %.6f %.6f %d delta %g limit %g \n",
         s,i,f->statustime[2],f->statustime[3], f->statustime[0], f->statustime[1], 
         timeoffset,delta/1e6,gpsoffset,gpsdiff/1e6,f->gps/1e6); 
   if(f->gps && (gpsdiff > f->gps)) // unix to master delta > gps usec 
     if(debug>1)  fprintf(stderr,"%s/WARNING: gps out of limit, unix time %d.%06d master time %d.%06d offset %.6f %.6f %d delta %g > %g\n",
         s,f->statustime[2],f->statustime[3], f->statustime[0], f->statustime[1], 
         timeoffset,delta/1e6,gpsoffset,gpsdiff/1e6,f->gps/1e6);  
   return 1; 
}

I will state the observations here and explain the intent of the code in fc250b.c below:

An overview of the clock requirements of FC can be found here: https://legend-exp.atlassian.net/wiki/download/attachments/457572382/legend-fc250b-2021-clockmodes.pdf?api=v2 which might help understand why only the timeoffset[2] field is added.

The clock mode used on site (at the moment) is what TK calls the "loosely coupled" mode. This means only the PPS (pulse per second) signal from the time box of LNGS is used, as the 10MHz is not reliably 10M peaks / second. The new box in dev should allow switching to the "strongly coupled" mode in the future, which uses the PPS and 10MHz. In both cases it's important to note, that the internal clock of the FC boards has a much higher stability than the server clock, as it's running at 250MHz after several steps of jitter cleaning (+self correction to external 10MHz) and the boards are in a temperature-controlled environment, while the server room is not actively cooled. In the loosely coupled mode, TK states, that the error on maxticks is at most 1e-7 (~20 ticks / 25000000) and better in strict mode. The numbers for CPUs are more on the order of a few 1e-6 or worse. The server itself is synchronising it's clock via NTP and the kernel code regulating the clock chip is not always smooth/monotonic (i.e. might exhibit jumps) - this depends ofc on the implementation.

In the best case scenario, the PPS fed to the boards from GPS is aligned with the PPS from NTP for the server, and all that is needed to calculate the offset from runtime to eventtime is stored in the timeoffset[2] field. If both clocks (server and boards) start drifting away from each other too much, it's up to the DAQ and Analysis to decide how to react. The readout-fc250b program is given a -gps <maxdeltatime_us>,<gps_mode> flag, which Orca does set. If the offset from a status readout is getting larger than the <maxdeltatime_us> value, the readout program will start printing warnings, but continuing to read out. This delta is stored in timeoffset[3] in us relative to the nearest PPS of the server clock, and stored as absolute value in timeoffset[4] and should probably be used to flag data quality in the analysis.

Unfortunately I do not now what the <maxdeltatime_us> value on site is set to at the moment.

When examining the fc250b.c code, one can see that the gpsoffset value is calculated with floor(timeoffset+0.25) which might seem odd. It's implicitly assumed that timeoffset[4] is checked and that <maxdeltatime_us> is less than 250ms. This allows for a drift of +-250ms of the server time around the FC PPS, and that any drift is less than 500ms / getstatustime() - with status being checked once per second. Otherwise the timeoffset[2] field might show jumps. This can be crosschecked by recovering the server offset with timeoffset[0] + timeoffset[1]*1e-6.

Sorry for the verbose comment, I thought being detailed might be more helpful, even though some might be aware of this already.