mwilensky768 / SSINS

BSD 2-Clause "Simplified" License
6 stars 7 forks source link

Optimize differencing in time #104

Closed no-lex closed 3 years ago

no-lex commented 3 years ago

This pull request approximately doubles the speed of the diff() function in sky_subtract.

Description

The original diff() function calls pyuvdata's safe data grabbing functions to get data to diff. However, the dataset is already put in baseline form, which ensures that the data's ordering is known, and therefore the overhead of these functions is not necessary. As a result, the new diff() is somewhat more than twice as fast as the old one (according to my machine [E5 2440 v2 @1.9 Ghz] and cProfile).

The new implementation of diff starts by allocating two indexing arrays which copy the baseline array. By differencing the baseline array between adjacent members, and knowing that the array is in baseline form, it can be seen that the only nonzero members of the array are at boundaries between different baselines. By doing this for two arrays, representing the original dataset (before removing baselines for differencing) and afterwards (with one baseline entry removed per baseline) we get a map of the dataset's structure before and after the differencing has happened. Using these two arrays, then, we can traverse the indices needed to select data to difference much faster than manually querying using pyuvdata's builtin functions.

pyuvdata 2.2 patch

Additionally, to accomodate the new pyuvdata 2.2 release, which has new data parameters, this PR includes the code required to diff them properly (by averaging). Since the parameters are optional, however, this PR does not break compatibility with previously supported versions of the library.

Main Code Body Checklist:

I do not believe most of these requirements are applicable, this is a drop in change (no functionality changes, besides compatibility)

If the pull request is in a mergeable state, I can update the CHANGELOG and the library requirements to suit at that time, to avoid dealing with a potential merge issue.

codecov-commenter commented 3 years ago

Codecov Report

Merging #104 (e1c9009) into master (6840fe2) will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master      #104   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            8         8           
  Lines          824       858   +34     
=========================================
+ Hits           824       858   +34     
Impacted Files Coverage Δ
SSINS/sky_subtract.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6840fe2...e1c9009. Read the comment docs.

no-lex commented 3 years ago

Performance tests: Intel Xeon E5 2440 v2 @ 1.9Ghz w/ 6x16GB PC3-10600R Ubuntu 18.04.5 Python 3.7

12mb file: Old:

New:

16gb file: Old:

New:

31gb file: Old:

New:

mwilensky768 commented 3 years ago

Thanks @no-lex

I'll merge this now.