Closed no-lex closed 3 years ago
Merging #104 (e1c9009) into master (6840fe2) will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## master #104 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 8 8
Lines 824 858 +34
=========================================
+ Hits 824 858 +34
Impacted Files | Coverage Δ | |
---|---|---|
SSINS/sky_subtract.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 6840fe2...e1c9009. Read the comment docs.
Performance tests: Intel Xeon E5 2440 v2 @ 1.9Ghz w/ 6x16GB PC3-10600R Ubuntu 18.04.5 Python 3.7
12mb file: Old:
New:
16gb file: Old:
New:
31gb file: Old:
New:
Thanks @no-lex
I'll merge this now.
This pull request approximately doubles the speed of the diff() function in sky_subtract.
Description
The original diff() function calls pyuvdata's safe data grabbing functions to get data to diff. However, the dataset is already put in baseline form, which ensures that the data's ordering is known, and therefore the overhead of these functions is not necessary. As a result, the new diff() is somewhat more than twice as fast as the old one (according to my machine [E5 2440 v2 @1.9 Ghz] and cProfile).
The new implementation of diff starts by allocating two indexing arrays which copy the baseline array. By differencing the baseline array between adjacent members, and knowing that the array is in baseline form, it can be seen that the only nonzero members of the array are at boundaries between different baselines. By doing this for two arrays, representing the original dataset (before removing baselines for differencing) and afterwards (with one baseline entry removed per baseline) we get a map of the dataset's structure before and after the differencing has happened. Using these two arrays, then, we can traverse the indices needed to select data to difference much faster than manually querying using pyuvdata's builtin functions.
pyuvdata 2.2 patch
Additionally, to accomodate the new pyuvdata 2.2 release, which has new data parameters, this PR includes the code required to diff them properly (by averaging). Since the parameters are optional, however, this PR does not break compatibility with previously supported versions of the library.
Main Code Body Checklist:
I do not believe most of these requirements are applicable, this is a drop in change (no functionality changes, besides compatibility)
Add or update docstring related to code changeAdd a unit test that verifies the efficacy of the codeWrite a tutorial if the feature would be useful to others to useIf the pull request is in a mergeable state, I can update the CHANGELOG and the library requirements to suit at that time, to avoid dealing with a potential merge issue.