labscript-suite / labscript

The 𝗹𝗮𝗯𝘀𝗰𝗿𝗶𝗽𝘁 library provides a translation from expressive Python code to low-level hardware instructions.
http://labscriptsuite.org
BSD 2-Clause "Simplified" License
9 stars 52 forks source link

Performance improvements #73

Closed philipstarkey closed 3 years ago

philipstarkey commented 4 years ago

These small changes seem to improve compilation time significantly. Tests (done with saving of hg info disabled) indicate compilation time for a complex experiment is decreased somewhere between 25-33%.

philipstarkey commented 4 years ago

As an example of the performance improvements, I compiled an old experiment from Monash. Compilation time improved from about 0.37s to 0.27s (27% decrease) as measured by time.time() statements (excluding imports).

Obviously disabling hg info is a pain, so I recommend testing on top of #72. You should see similar performance increases (ignoring the first compilation that produces the VCS cache, see details in #72).

zakv commented 3 years ago

I did some timing tests with this PR on my machine and some of the results are below (Click the arrows to expand). Interestingly it looks like the time-critical sections of the code are different for us than for the Monash sequence, so this PR doesn't speed up the total compilation time much. That said, make_timeseries() did speed up by ~15%, expand_timeseries() sped up by ~40%, and generate_clock() sped up by ~25%. Since it seems that these speedups can make a noticeable difference for other users, I think it would be worth merging this PR. I haven't had a chance to look at the edits in detail yet though so I'll do that before merging.

For reference, I did all of these tests using PR #72 as suggested above. The results were measured with time.perf_counter() and printed to console. The functions collect_change_times(), generate_clock(), and generate_code() were each called only once. The functions make_timeseries() and expand_timeseries() are each called in their own for loops, and the total duration of each of those for loops was printed. Note that some of these functions are called by other functions, so the durations of some are included in the durations of others. The total compilation time was printed from runmanager.batch_compiler.BatchProcessor.compile(), but the rest of the print() statements were in labscript.py.

Without optimizations: ``` Setting up hdf5 file groups in generate_code() took 5.268699999998461 ms collect_change_times() took: 1.099700000001036 ms make_timeseries() calls took: 0.7587000000004451 ms expand_timeseries() calls took: 5.616899999999703 ms generate_clock() took: 10.280800000000312 ms Generating code for device pulseblaster_0 took 50.94629999999967 ms Generating code for device mercury_795_lock_synth took 0.3864999999994012 ms Generating code for device pump_795_vertical_actuator_controller took 0.3220000000005996 ms Generating code for device x_westward_vertical_actuator_controller took 0.3459999999986252 ms Generating code for device x_westward_forward_waveplate_interface_board took 0.28829999999935296 ms Generating code for device x_westward_retro_waveplate_interface_board took 0.2583000000004887 ms Generating code for device y_northward_horizontal_actuator_controller took 0.339799999999002 ms Generating code for device y_northward_waveplate_interface_board took 0.2866000000008029 ms Generating code for device z_downward_horizontal_actuator_controller took 0.6058000000006558 ms Generating code for device z_downward_vertical_actuator_controller took 0.40500000000065484 ms Calling device.generate_code() for each device in generate_code() took 55.10220000000032 ms save_time_markers() took 0.6052000000007496 ms generate_connection_table() took 4.368200000000044 ms write_device_properties() took 7.003799999999671 ms generate_wait_table() took 0.9721000000002533 ms save_labscripts() took 15.336000000001349 ms saving shot properties in generate_code() took 0.3874999999986528 ms generate_code() took: 99.67520000000007 ms Total compilation time: 189.21519999999958 ms Ready. Setting up hdf5 file groups in generate_code() took 5.242000000000857 ms collect_change_times() took: 1.1190000000009803 ms make_timeseries() calls took: 0.8987000000004741 ms expand_timeseries() calls took: 5.88989999999967 ms generate_clock() took: 10.665400000000602 ms Generating code for device pulseblaster_0 took 44.40109999999997 ms Generating code for device mercury_795_lock_synth took 0.3633999999994586 ms Generating code for device pump_795_vertical_actuator_controller took 0.3991999999986007 ms Generating code for device x_westward_vertical_actuator_controller took 0.29769999999906815 ms Generating code for device x_westward_forward_waveplate_interface_board took 0.29479999999892925 ms Generating code for device x_westward_retro_waveplate_interface_board took 0.2706000000003428 ms Generating code for device y_northward_horizontal_actuator_controller took 0.2805999999999642 ms Generating code for device y_northward_waveplate_interface_board took 0.24380000000157054 ms Generating code for device z_downward_horizontal_actuator_controller took 0.25340000000007024 ms Generating code for device z_downward_vertical_actuator_controller took 0.26450000000011187 ms Calling device.generate_code() for each device in generate_code() took 47.809400000000224 ms save_time_markers() took 0.41129999999967026 ms generate_connection_table() took 3.800999999999277 ms write_device_properties() took 7.612699999999251 ms generate_wait_table() took 0.8271000000004136 ms save_labscripts() took 14.052399999998855 ms saving shot properties in generate_code() took 0.40619999999869094 ms generate_code() took: 89.76990000000029 ms Total compilation time: 173.8724999999999 ms Ready. Setting up hdf5 file groups in generate_code() took 5.28169999999939 ms collect_change_times() took: 1.0999999999992127 ms make_timeseries() calls took: 0.8739000000002051 ms expand_timeseries() calls took: 5.6861999999995305 ms generate_clock() took: 10.38100000000064 ms Generating code for device pulseblaster_0 took 45.92629999999964 ms Generating code for device mercury_795_lock_synth took 0.3756999999993127 ms Generating code for device pump_795_vertical_actuator_controller took 0.36120000000039454 ms Generating code for device x_westward_vertical_actuator_controller took 0.2883999999987452 ms Generating code for device x_westward_forward_waveplate_interface_board took 0.301600000000235 ms Generating code for device x_westward_retro_waveplate_interface_board took 0.26130000000001985 ms Generating code for device y_northward_horizontal_actuator_controller took 0.26250000000160867 ms Generating code for device y_northward_waveplate_interface_board took 0.259700000000862 ms Generating code for device z_downward_horizontal_actuator_controller took 0.2562000000008169 ms Generating code for device z_downward_vertical_actuator_controller took 0.25450000000049045 ms Calling device.generate_code() for each device in generate_code() took 49.27909999999969 ms save_time_markers() took 0.4015000000006097 ms generate_connection_table() took 3.726700000001415 ms write_device_properties() took 7.46100000000105 ms generate_wait_table() took 0.7841999999982363 ms save_labscripts() took 14.326799999999196 ms saving shot properties in generate_code() took 0.35409999999913566 ms generate_code() took: 92.35199999999999 ms Total compilation time: 175.73550000000006 ms Ready. ```
With optimizations: ``` Setting up hdf5 file groups in generate_code() took 5.075299999999672 ms collect_change_times() took: 1.333900000000554 ms make_timeseries() calls took: 0.9043999999995833 ms expand_timeseries() calls took: 3.4863999999998896 ms generate_clock() took: 9.006499999999917 ms Generating code for device pulseblaster_0 took 47.57550000000066 ms Generating code for device mercury_795_lock_synth took 0.5210000000008819 ms Generating code for device pump_795_vertical_actuator_controller took 0.28649999999963427 ms Generating code for device x_westward_vertical_actuator_controller took 0.2822999999985143 ms Generating code for device x_westward_forward_waveplate_interface_board took 0.27370000000104255 ms Generating code for device x_westward_retro_waveplate_interface_board took 0.23299999999970566 ms Generating code for device y_northward_horizontal_actuator_controller took 0.23909999999993659 ms Generating code for device y_northward_waveplate_interface_board took 0.23239999999979943 ms Generating code for device z_downward_horizontal_actuator_controller took 0.22799999999989495 ms Generating code for device z_downward_vertical_actuator_controller took 0.22650000000012938 ms Calling device.generate_code() for each device in generate_code() took 50.84360000000032 ms save_time_markers() took 0.6136999999988291 ms generate_connection_table() took 3.759300000000465 ms write_device_properties() took 7.314600000000837 ms generate_wait_table() took 0.7789000000002488 ms save_labscripts() took 15.03869999999985 ms saving shot properties in generate_code() took 0.3647000000004397 ms generate_code() took: 94.42479999999875 ms Total compilation time: 181.55040000000076 ms Ready. Setting up hdf5 file groups in generate_code() took 5.3496000000006205 ms collect_change_times() took: 1.1734000000007683 ms make_timeseries() calls took: 0.6454999999991884 ms expand_timeseries() calls took: 3.148300000001214 ms generate_clock() took: 7.616000000000511 ms Generating code for device pulseblaster_0 took 43.9444999999985 ms Generating code for device mercury_795_lock_synth took 0.35519999999955587 ms Generating code for device pump_795_vertical_actuator_controller took 0.37410000000015486 ms Generating code for device x_westward_vertical_actuator_controller took 0.31270000000027665 ms Generating code for device x_westward_forward_waveplate_interface_board took 0.2710999999990804 ms Generating code for device x_westward_retro_waveplate_interface_board took 0.36039999999992745 ms Generating code for device y_northward_horizontal_actuator_controller took 0.43810000000021887 ms Generating code for device y_northward_waveplate_interface_board took 0.3990999999992084 ms Generating code for device z_downward_horizontal_actuator_controller took 0.41840000000092914 ms Generating code for device z_downward_vertical_actuator_controller took 0.27749999999926445 ms Calling device.generate_code() for each device in generate_code() took 47.964600000000246 ms save_time_markers() took 0.5041000000005624 ms generate_connection_table() took 5.34150000000011 ms write_device_properties() took 10.021599999999964 ms generate_wait_table() took 0.7750999999984742 ms save_labscripts() took 14.397400000000005 ms saving shot properties in generate_code() took 0.36730000000062546 ms generate_code() took: 95.1822 ms Total compilation time: 182.84210000000024 ms Ready. Setting up hdf5 file groups in generate_code() took 4.944699999999358 ms collect_change_times() took: 1.2756999999989915 ms make_timeseries() calls took: 0.6605999999997891 ms expand_timeseries() calls took: 3.428899999999402 ms generate_clock() took: 8.030400000000881 ms Generating code for device pulseblaster_0 took 41.73760000000115 ms Generating code for device mercury_795_lock_synth took 0.4084000000013077 ms Generating code for device pump_795_vertical_actuator_controller took 0.31740000000013424 ms Generating code for device x_westward_vertical_actuator_controller took 0.30540000000023326 ms Generating code for device x_westward_forward_waveplate_interface_board took 0.2919000000005667 ms Generating code for device x_westward_retro_waveplate_interface_board took 0.27000000000043656 ms Generating code for device y_northward_horizontal_actuator_controller took 0.2773999999998722 ms Generating code for device y_northward_waveplate_interface_board took 0.2602999999989919 ms Generating code for device z_downward_horizontal_actuator_controller took 0.2718999999995475 ms Generating code for device z_downward_vertical_actuator_controller took 0.28539999999921406 ms Calling device.generate_code() for each device in generate_code() took 45.268000000000086 ms save_time_markers() took 0.5264999999994302 ms generate_connection_table() took 3.7655999999994805 ms write_device_properties() took 7.201300000000188 ms generate_wait_table() took 1.4545999999988624 ms save_labscripts() took 17.393999999999465 ms saving shot properties in generate_code() took 0.5634000000007688 ms generate_code() took: 91.53949999999966 ms Total compilation time: 172.10829999999964 ms Ready. ```

Note: For my own reference, these are the timings for compiling a 575 ms 11-stage Raman cooling sequence with pre-release ramp/hold.

zakv commented 3 years ago

Just finished going through all the changes in detail and everything looks good! After reading through the code I'm convinced that all of the created arrays have the same data, datatype, and shape as before. As a sanity check, I also spent some time looking at a sequence compiled with/without the changes in this PR. The waveforms were identical everywhere I looked, even when zooming in quite far. In short, it looks to me like this PR increase performance without any changes in behavior, as desired. I think it's ready to be merged.

dihm commented 3 years ago

I've tested this against a very simple compilation on my test rig and have also found that it doesn't really speed up the overall compilation. It doesn't break anything either though, so it is likely fine. If desired, I could test on a more complicated experiment in the lab next week, but I don't think that will be necessary. I'd say go ahead and merge.

zakv commented 3 years ago

Agreed, thanks for taking a look at this

philipstarkey commented 3 years ago

Makes sense that shorter experiments would not see the performance improvements given they mostly related to array length calculation and manipulation.

It's also difficult to compare absolute compilation times since we all have different machines. My tests were done on an i7-8700K CPU (my dev machine at home). That's generally more powerful than laptops or whatever our university IT department would provide. Writing to h5 file may also dominate if the PC doesn't have an SSD. Not sure what you're testing on but it's possible the Monash sequence would take ~1s to compile on your machines vs ~.3s on mine.

Or maybe it's something else.

Does suggest maybe we should create a standardised really complex example script we could all use for benchmarking!

Thanks for testing!