no-lex commented 3 years ago

This pull request approximately doubles the speed of the diff() function in sky_subtract.

Description

The original diff() function calls pyuvdata's safe data grabbing functions to get data to diff. However, the dataset is already put in baseline form, which ensures that the data's ordering is known, and therefore the overhead of these functions is not necessary. As a result, the new diff() is somewhat more than twice as fast as the old one (according to my machine [E5 2440 v2 @1.9 Ghz] and cProfile).

The new implementation of diff starts by allocating two indexing arrays which copy the baseline array. By differencing the baseline array between adjacent members, and knowing that the array is in baseline form, it can be seen that the only nonzero members of the array are at boundaries between different baselines. By doing this for two arrays, representing the original dataset (before removing baselines for differencing) and afterwards (with one baseline entry removed per baseline) we get a map of the dataset's structure before and after the differencing has happened. Using these two arrays, then, we can traverse the indices needed to select data to difference much faster than manually querying using pyuvdata's builtin functions.

pyuvdata 2.2 patch

Additionally, to accomodate the new pyuvdata 2.2 release, which has new data parameters, this PR includes the code required to diff them properly (by averaging). Since the parameters are optional, however, this PR does not break compatibility with previously supported versions of the library.

Main Code Body Checklist:

I do not believe most of these requirements are applicable, this is a drop in change (no functionality changes, besides compatibility)

[ ] ~~Add or update docstring related to code change~~
[ ] ~~Add a unit test that verifies the efficacy of the code~~
[ ] ~~Write a tutorial if the feature would be useful to others to use~~
[ ] Update CHANGELOG.md

If the pull request is in a mergeable state, I can update the CHANGELOG and the library requirements to suit at that time, to avoid dealing with a potential merge issue.

codecov-commenter commented 3 years ago

Codecov Report

Merging #104 (e1c9009) into master (6840fe2) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #104   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            8         8           
  Lines          824       858   +34     
=========================================
+ Hits           824       858   +34

Impacted Files	Coverage Δ
SSINS/sky_subtract.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6840fe2...e1c9009. Read the comment docs.

no-lex commented 3 years ago

Performance tests: Intel Xeon E5 2440 v2 @ 1.9Ghz w/ 6x16GB PC3-10600R Ubuntu 18.04.5 Python 3.7

12mb file: Old:

92107 function calls (89359 primitive calls) in 0.223 seconds

New:

34606 function calls (31750 primitive calls) in 0.103 seconds (2.16x faster)

16gb file: Old:

6722358 function calls (6559920 primitive calls) in 93.478 seconds

New:

1004773 function calls (969375 primitive calls) in 46.040 seconds (2.02x faster)

31gb file: Old:

6625565 function calls (6464503 primitive calls) in 197.469 seconds

New:

989541 function calls (954655 primitive calls) in 109.798 seconds (1.79x faster)

mwilensky768 commented 3 years ago

Thanks @no-lex

I'll merge this now.

mwilensky768 / SSINS

Optimize differencing in time #104

Description

pyuvdata 2.2 patch

Main Code Body Checklist:

Codecov Report