slaclab / pysmurf

Other
2 stars 9 forks source link

SO production tests #752

Closed swh76 closed 1 year ago

swh76 commented 1 year ago

Description

SO SMuRF system verification is nearly complete, with only a few stragglers, so this PR checks in the validation scripts that we used to check some critical aspects of the carriers and AMCs. In particular those are validating JESD on the carriers before permanently affixing the FPGA heatsink, validating the AMC response versus up- and down-converter attenuation settings, and verifying carrier thermal performance (after FPGA heatsink has been permanently affixed).

Currently technicians at SLAC run the scripts from smurf-srv11 at SLAC using these bash aliases on that server:

pysmurf_prod_test_version=/home/cryo/docker/pysmurf/dev/v7.1.0_prodtest
alias THERMALtest='shawnhammer -t -c '${pysmurf_prod_test_version}'/pysmurf/cfg_files/rflab/smurf_startup_thermal_test.cfg'
alias THERMALtest2='shawnhammer -t -c '${pysmurf_prod_test_version}'/pysmurf/cfg_files/rflab/smurf_startup_thermal_test2.cfg'
alias AMCLOOPBACKtest='shawnhammer -c '${pysmurf_prod_test_version}'/pysmurf/cfg_files/rflab/rflab_smurf_startup_stephen_test.cfg'
alias AMCLOOPBACKtest2='shawnhammer -c '${pysmurf_prod_test_version}'/pysmurf/cfg_files/rflab/rflab_smurf_startup_stephen_test2.cfg'
alias JESDtest='shawnhammer -c '${pysmurf_prod_test_version}'/pysmurf/cfg_files/rflab/smurf_startup_low_power_test.cfg'
alias JESDtest2='shawnhammer -c '${pysmurf_prod_test_version}'/pysmurf/cfg_files/rflab/smurf_startup_low_power_test2.cfg'

These scripts assume some extra things are installed on your server, including gnuplot, ipython3, and scripts to log the tmux sessions run by shawnhammer. For tmux logging, put this in a text file named ~/.tmux.conf

set -g status-interval 2
set -g status-left "#S #[fg=green,bg=black]#(tmux-mem-cpu-load --colors --interval 2)#[default]"
set -g status-left-length 60

run-shell /home/cryo/tmux-logging/logging.tmux

and unpack this in your /home/cryo directory - tmux-logging.tar.gz.

You will also need the SMuRF atca-monitor and utils dockers setup such that their run.sh scripts are at ~/docker/atca_monitor/run.sh and ~/docker/utils/run.sh, respectively.

JESDtest

E.g. JESDtest2 as aliased above running on slot 2 boots up system and runs scratch/shawn/test_new_carrier.py. Here's typical pysmurf output (excluding setup and truncating the full_band_response output):

In [2]: S.set_stream_enable(0)                                                  

In [3]: exec(open("scratch/shawn/test_new_carrier.py").read())                  
-> Waiting 1 min after setup.
-> Checking JESD status
    Bay 0 JESD Rx Valid True
    Bay 0 JESD Tx Valid True
    Bay 0 JESD Rx Valid Count 0
    Bay 0 JESD Tx Valid Count 0
    Bay 1 JESD Rx Valid True
    Bay 1 JESD Tx Valid True
    Bay 1 JESD Rx Valid Count 0
    Bay 1 JESD Tx Valid Count 0
-> Make sure JESDs are all valid and counts are 0 (press enter)...

-> Checking full band response to confirm RF is properly configured on slot 2.
Inside full_band_response.py. Starting.

Band 0

[ 2022-12-04 00:16:33 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:DaqMuxV2[0]:DataBufferSize 4096
[ 2022-12-04 00:16:33 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:AppCore:DebugSelect[0] 0
[ 2022-12-04 00:16:33 ]  ADC0 max count: 20009
[ 2022-12-04 00:16:33 ]  ADC0 min count: -20480
[ 2022-12-04 00:16:33 ]  ADC0 not saturated
...truncating full_band_response.py output...

Saving plot to /data/smurf_data/20221204/1670112700/plots/1670112993_full_band_resp_all.png
Saving data to /data/smurf_data/20221204/1670112700/outputs/1670112993_full_band_resp_all.npy
Done running full_band_response.py.
-> Done running full_band_response.py.
-> Visually check the measured full band response on slot 2 before continuing (press enter)...

Waiting 15 minutes until next check
-> Checking JESD status
    Bay 0 JESD Rx Valid True
    Bay 0 JESD Tx Valid True
    Bay 0 JESD Rx Valid Count 72
    Bay 0 JESD Tx Valid Count 0
    Bay 1 JESD Rx Valid True
    Bay 1 JESD Tx Valid True
    Bay 1 JESD Rx Valid Count 0
    Bay 1 JESD Tx Valid Count 0
-> Make sure JESDs are all valid and counts are 0 (press enter)...
Test complete.  Copy the logfile and loopback test image to the hardware database:
/data/smurf_data/tmux_logs/tmux-smurf-2-1-20221203T161137.log
/data/smurf_data/20221204/1670112700/plots/1670112993_full_band_resp_all.png

User is asked to confirm JESD values twice (once 1 min after setup and once 15 min after setup) and to confirm full band response looks good. Here's a typical good looking full band response plot:

Screenshot 2022-12-03 at 4 18 53 PM

AMCLOOPBACKtest

E.g. AMCLOOPBACKtest2 as aliased above running on slot 2 boots up system and runs scratch/stephen/full_band_response_AMCatten.py. Here's typical pysmurf output (excluding setup and truncating the full_band_response outputs):

In [3]: exec(open("scratch/stephen/full_band_response_AMCatten.py").read())
Beginning Attenuation Test
Beginning Attenuation Test
software: 7.1.0
firmware: v4.11.10
fpga: 16842752, MicrowaveMuxBpEthGen2: Vivado v2020.2, rdsrv317 (x86_64), Built Tue 28 Sep 2021 12:39:41 PM PDT by ruckman
Start time: 1670191353
Bay 0 Asset Tag: C03-A01-109
Bay 1 Asset Tag: C03-A01-066
Getting board types...
Bay 0 board type: low
Bay 1 board type: low
Initilizing Boards
STARTING TEST

Up Converters, Attenuation 0, Band 0

[ 2022-12-04 22:02:35 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:DaqMuxV2[0]:DataBufferSize 4096
[ 2022-12-04 22:02:35 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:AppCore:DebugSelect[0] 0
[ 2022-12-04 22:02:36 ]  ADC0 max count: 19979
[ 2022-12-04 22:02:36 ]  ADC0 min count: -20715
[ 2022-12-04 22:02:36 ]  ADC0 not saturated
[ 2022-12-04 22:02:36 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:DaqMuxV2[0]:DataBufferSize 524288
[ 2022-12-04 22:02:36 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:AppCore:DebugSelect[0] 0
[ 2022-12-04 22:02:36 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:DaqMuxV2[0]:DataBufferSize 524288
[ 2022-12-04 22:02:36 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:AppCore:DebugSelect[0] 0

Up Converters, Attenuation 0, Band 1
...truncating output...

Down Converters, Attenuation 31, Band 7

[ 2022-12-04 22:09:59 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:DaqMuxV2[1]:DataBufferSize 4096
[ 2022-12-04 22:09:59 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:AppCore:DebugSelect[1] 3
[ 2022-12-04 22:09:59 ]  ADC7 max count: 1288
[ 2022-12-04 22:09:59 ]  ADC7 min count: -1442
[ 2022-12-04 22:09:59 ]  ADC7 not saturated
[ 2022-12-04 22:09:59 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:DaqMuxV2[1]:DataBufferSize 524288
[ 2022-12-04 22:09:59 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:AppCore:DebugSelect[1] 3
[ 2022-12-04 22:10:01 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:DaqMuxV2[1]:DataBufferSize 524288
[ 2022-12-04 22:10:01 ]  caput smurf_server_s2:AMCc:FpgaTopLevel:AppTop:AppCore:DebugSelect[1] 3
Saving plot to /data/smurf_data/20221204/1670191247/plots/1670191353_full_band_resp_atten.png
Test started: 1670191353
Test started: 1670191808
RF Test Complete

This test typically takes ~8 min and produces full band response plots vs attenuation for both installed AMCs. Here's typical plots:

Screenshot 2022-12-04 at 3 21 27 PM Screenshot 2022-12-04 at 3 21 46 PM

THERMALtest

E.g. THERMALtest2 as aliased above running on slot 2 (fyi this is just an example, usually 7-slot crates have lower cooling efficiency on slot 2 so thermal performance will be measurably worse, so keep that in mind) boots up system and runs scratch/shawn/thermal_test.py. Be aware the directory some of the thermal test script outputs gets saved to is hard coded in thermal_test.py and the test will fail if that directory doesn't exist on disk.

The thermal test script is run in an ipython session running outside of the dockers. Here's typical output from that:

cd /home/cryo/docker/pysmurf/dev/v7.1.0_prodtest                                                                                                         [1125/1125]
ipython3 -i pysmurf/scratch/shawn/thermal_test.py 2                                                                                           
(base) cryo@smurf-srv11:~/docker/utils$ cd /home/cryo/docker/pysmurf/dev/v7.1.0_prodtest
(base) cryo@smurf-srv11:~/docker/pysmurf/dev/v7.1.0_prodtest$ ipython3 -i pysmurf/scratch/shawn/thermal_test.py 2
Python 3.9.7 (default, Sep 16 2021, 13:09:58)                                   
Type 'copyright', 'credits' or 'license' for more information                   
IPython 8.7.0 -- An enhanced Interactive Python. Type '?' for help.                                                                       
COMTEL crate!                                                                                                                             
Running thermal test on slots [2] ...                                                                                                     
-> Setting fan speeds to full (=100)                                                                                                      
clia minfanlevel 20 4 100; sleep 1; clia setfanlevel 20 4 100; sleep 1;clia minfanlevel 20 3 100; sleep 1; clia setfanlevel 20 3 100; sleep 1;

Pigeon Point Shelf Manager Command Line Interpreter                                                                                       

Minimal Fan Level for (20, 4) is set to 100                                                                                               

Pigeon Point Shelf Manager Command Line Interpreter                                                                                       

20: FRU # 4 Set Fan Level to: 100                                                                                                         

Pigeon Point Shelf Manager Command Line Interpreter                                                                                       

Minimal Fan Level for (20, 3) is set to 100                                                                                               

Pigeon Point Shelf Manager Command Line Interpreter                                                                                       

20: FRU # 3 Set Fan Level to: 100                                                                                        
-> Logging to /data/smurf_data/simonsobs_6carrier_long_thermal_test_Aug2020/1670197722_hwlog.dat.                                          
2 S.start_hardware_logging("/data/smurf_data/simonsobs_6carrier_long_thermal_test_Aug2020/1670197722_hwlog.dat") smurf           
2 if len(sys.argv)==1: sys.argv=[sys.argv[0],None] smurf                                                                                           
Waiting for /data/smurf_data/simonsobs_6carrier_long_thermal_test_Aug2020/1670197722_hwlog.dat to start being populated ...
Waiting for /data/smurf_data/simonsobs_6carrier_long_thermal_test_Aug2020/1670197722_hwlog.dat to start being populated ...
2                                                                               
-> Waiting 1 min before setup.         
Warning: empty x range [1.6702e+09:1.6702e+09], adjusting to [1.6535e+09:1.6869e+09]                                                                     [1091/1125]
2 S.shelf_manager="shm-smrf-sp01"; S.setup() smurf                                                                                        
-> Waiting for setup(s) to complete.                                                                                                      
-> Disabling streaming                                                                                                                    
-> Disable streaming                                                                                                                      
2 S.set_stream_enable(0) smurf                                                                                                            
-> Waiting 1 min after setup.                                                                                                             
-> Checking full band response to confirm RF is properly configured on slot 2.                                                                  
2 exec(open("/usr/local/src/pysmurf/scratch/shawn/full_band_response.py").read()) smurf                                                   
-> Done with full band response on slot 2.                                                                                                
2 sys.argv[1]=0; exec(open("/usr/local/src/pysmurf/scratch/shawn/fill_band.py").read()) smurf                                             
-> Waiting 1 min after band 0 fill.                                                                                                       
2 sys.argv[1]=1; exec(open("/usr/local/src/pysmurf/scratch/shawn/fill_band.py").read()) smurf
-> Waiting 1 min after band 1 fill.                                                          
2 sys.argv[1]=2; exec(open("/usr/local/src/pysmurf/scratch/shawn/fill_band.py").read()) smurf
-> Waiting 1 min after band 2 fill.                                                          
2 sys.argv[1]=3; exec(open("/usr/local/src/pysmurf/scratch/shawn/fill_band.py").read()) smurf
-> Waiting 1 min after band 3 fill.
2 sys.argv[1]=4; exec(open("/usr/local/src/pysmurf/scratch/shawn/fill_band.py").read()) smurf
-> Waiting 1 min after band 4 fill.
2 sys.argv[1]=5; exec(open("/usr/local/src/pysmurf/scratch/shawn/fill_band.py").read()) smurf
-> Waiting 1 min after band 5 fill.
2 sys.argv[1]=6; exec(open("/usr/local/src/pysmurf/scratch/shawn/fill_band.py").read()) smurf
-> Waiting 1 min after band 6 fill.
2 sys.argv[1]=7; exec(open("/usr/local/src/pysmurf/scratch/shawn/fill_band.py").read()) smurf
-> Waiting 1 min after band 7 fill.
-> Waiting 1 min after band fills.
-> Running eta scan on slot 2, band 0...
2 S.run_serial_eta_scan(0) smurf
-> Eta scan for slot 2, band 0 completed.
-> Waiting 1 min btw eta scans on different slots.
-> All band 0 eta scans completed.
-> Waiting 0 min after band 0 eta scans.
-> Running eta scan on slot 2, band 1...
2 S.run_serial_eta_scan(1) smurf
-> Eta scan for slot 2, band 1 completed.
-> Waiting 1 min btw eta scans on different slots.
-> All band 1 eta scans completed.
-> Waiting 0 min after band 1 eta scans.
-> Running eta scan on slot 2, band 2...
2 S.run_serial_eta_scan(2) smurf
-> Eta scan for slot 2, band 2 completed.
-> Waiting 1 min btw eta scans on different slots.
-> All band 2 eta scans completed.
-> Waiting 0 min after band 2 eta scans.
-> Running eta scan on slot 2, band 3...
2 S.run_serial_eta_scan(3) smurf
-> Eta scan for slot 2, band 3 completed.
-> Waiting 1 min btw eta scans on different slots.
-> All band 3 eta scans completed.
-> Waiting 0 min after band 3 eta scans.
-> Running eta scan on slot 2, band 4...
2 S.run_serial_eta_scan(4) smurf
-> Eta scan for slot 2, band 4 completed.
-> Waiting 1 min btw eta scans on different slots.
-> All band 4 eta scans completed.                                                                                                        
-> Waiting 0 min after band 4 eta scans.                                                                                                  
-> Running eta scan on slot 2, band 5...                                                                                                  
2 S.run_serial_eta_scan(5) smurf                                                                                                          
-> Eta scan for slot 2, band 5 completed.                                                                                                 
-> Waiting 1 min btw eta scans on different slots.                                                                                              
-> All band 5 eta scans completed.                                                                                                        
-> Waiting 0 min after band 5 eta scans.                                                                                                  
-> Running eta scan on slot 2, band 6...                                                                                                  
2 S.run_serial_eta_scan(6) smurf                                                                                                          
-> Eta scan for slot 2, band 6 completed.                                                    
-> Waiting 1 min btw eta scans on different slots.                                           
-> All band 6 eta scans completed.                                                           
-> Waiting 0 min after band 6 eta scans.                                                     
-> Running eta scan on slot 2, band 7...                                                     
2 S.run_serial_eta_scan(7) smurf   
-> Eta scan for slot 2, band 7 completed.                                                    
-> Waiting 1 min btw eta scans on different slots.
-> All band 7 eta scans completed.                                                           
-> Waiting 0 min after band 7 eta scans.
-> Waiting 0 min after eta scans.                                                            
-> Running tracking setup on slot 2, band 0...
2 S.tracking_setup(0,reset_rate_khz=10) smurf                                                
-> Waiting 0 min after band 0 tracking setup.
-> Running tracking setup on slot 2, band 1...
2 S.tracking_setup(1,reset_rate_khz=10) smurf
-> Waiting 0 min after band 1 tracking setup.
-> Running tracking setup on slot 2, band 2...
2 S.tracking_setup(2,reset_rate_khz=10) smurf     
-> Waiting 0 min after band 2 tracking setup.
-> Running tracking setup on slot 2, band 3...
2 S.tracking_setup(3,reset_rate_khz=10) smurf
-> Waiting 0 min after band 3 tracking setup.
-> Running tracking setup on slot 2, band 4...
2 S.tracking_setup(4,reset_rate_khz=10) smurf     
-> Waiting 0 min after band 4 tracking setup.
-> Running tracking setup on slot 2, band 5...
2 S.tracking_setup(5,reset_rate_khz=10) smurf
-> Waiting 0 min after band 5 tracking setup.
-> Running tracking setup on slot 2, band 6...
2 S.tracking_setup(6,reset_rate_khz=10) smurf     
-> Waiting 0 min after band 6 tracking setup.
-> Running tracking setup on slot 2, band 7...
2 S.tracking_setup(7,reset_rate_khz=10) smurf
-> Waiting 0 min after band 7 tracking setup.
-> Tracking setup run on all bands on slot 2.
-> Waiting 1 min after tracking setups.
-> Enabling streaming on slot 2.                                                                                                                          [989/1125]
-> Enable streaming
2 S.set_stream_enable(1) smurf
-> Waiting 1 min after turning on streaming for all slots.
-> Dwelling for 2 min with everything on at full fan level ...
-> Restricting fan speeds to 50 (out of 100).
-> Dwelling for 15 min at restricted fan level ...
clia setfanpolicy 20 4 DISABLE; sleep 1;clia setfanpolicy 20 3 DISABLE; sleep 1;

Pigeon Point Shelf Manager Command Line Interpreter

    Fan policy updated successfully

Pigeon Point Shelf Manager Command Line Interpreter

    Fan policy updated successfully
clia minfanlevel 20 4 50; sleep 1; clia setfanlevel 20 4 50; sleep 1;clia minfanlevel 20 3 50; sleep 1; clia setfanlevel 20 3 50; sleep 1;
... truncating repeated commands to keep fan levels restricted ...
-> Writing ATCA state to /data/smurf_data/simonsobs_6carrier_long_thermal_test_Aug2020/1670197722_atca.yml.
2 S.write_atca_monitor_state("/data/smurf_data/simonsobs_6carrier_long_thermal_test_Aug2020/1670197722_atca.yml") smurf
2 S.write_state("/data/smurf_data/simonsobs_6carrier_long_thermal_test_Aug2020/1670197722_s2.yml") smurf
-> Writing output of amcc_dump to /data/smurf_data/simonsobs_6carrier_long_thermal_test_Aug2020/1670197722_amcc_dump.txt.
2 os.system("amcc_dump --all shm-smrf-sp01 > /data/smurf_data/simonsobs_6carrier_long_thermal_test_Aug2020/1670197722_amcc_dump.txt") smurf
-> Writing output of amcc_dump_bsi to /data/smurf_data/simonsobs_6carrier_long_thermal_test_Aug2020/1670197722_amcc_dump_bsi.txt.
2 os.system("amcc_dump_bsi --all shm-smrf-sp01 > /data/smurf_data/simonsobs_6carrier_long_thermal_test_Aug2020/1670197722_amcc_dump_bsi.txt") smurf
-> Done restricting fan speeds, re-enabling the fan policy...
clia setfanpolicy 20 4 ENABLE; sleep 1;clia setfanpolicy 20 3 ENABLE; sleep 1;

Pigeon Point Shelf Manager Command Line Interpreter

    Fan policy updated successfully

Pigeon Point Shelf Manager Command Line Interpreter

    Fan policy updated successfully
Still logging ...
gnuplot -p -c pysmurf/scratch/shawn/plot_temperatures.gnuplot /data/smurf_data/simonsobs_6carrier_long_thermal_test_Aug2020/1670197722_hwlog.dat

The script shows a full band response (typical below), then shows an updating gnuplot plot of key SMuRF system temperatures and currents as the thermal test script runs to completion. That gnuplot plot after the thermal test script runs to completion for an example is also below.

Screenshot 2022-12-04 at 3 57 21 PM Screenshot 2022-12-04 at 4 55 28 PM

a bunch of log data, including the temperature/current data is also saved to disk during & at the end of the thermal test script, including (with examples);