Closed bennahugo closed 5 years ago
On a single small 32k scan (32k1934.ms.2min.ms) the difference is not large (this is on stevie whose disk may be contested).
xarray-ms master:
tricolour - 2019-06-24 11:13:47,654 INFO - Data flagged successfully in 00h07m37s
xarray-ms 0.1.6:
tricolour - 2019-06-24 11:25:57,847 INFO - Data flagged successfully in 00h05m51s
I'll see what happens on a larger dataset
On my laptop with a 3 scan 4K dataset (1527016443_sdp_l0.full_1284.hh_vv.ms) the difference is neglible:
xarray-ms master:
tricolour - 2019-06-24 10:51:58,933 INFO - Data flagged successfully in 00h10m33s
xarray-ms 0.1.6
tricolour - 2019-06-24 11:51:31,495 INFO - Data flagged successfully in 00h10m31s
Working hypothesis: 52 GB scans are being created which are too big to fit on a single DIMM, leading to unnecessary inter-CPU commuication.
Solution: Allocate per-baseline arrays.
Spot the non-linearity:
# 7.45 GB
In [11]: %timeit -n 1 -r 1 np.ones(int(1e9), dtype=np.complex64)
1 loop, best of 1: 2.1 s per loop
# 14.9 GB
In [12]: %timeit -n 1 -r 1 np.ones(int(2e9), dtype=np.complex64)
1 loop, best of 1: 4.22 s per loop
# 29.8 GB
In [13]: %timeit -n 1 -r 1 np.ones(int(4e9), dtype=np.complex64)
1 loop, best of 1: 2min 42s per loop
Things seem to get much slower above the 16GB power of 2.
# 16.3GB
In [23]: %timeit -n 1 -r 1 np.ones(int(2.2e9), dtype=np.complex64)
1 loop, best of 1: 49.5 s per loop
Can't reproduce. See my email. Are you sure it is not just the page caches playing heavoc with you?
On Mon, 24 Jun 2019, 17:08 Simon Perkins, notifications@github.com wrote:
Spot the non-linearity:
7.45 GB
In [11]: %timeit -n 1 -r 1 np.ones(int(1e9), dtype=np.complex64) 1 loop, best of 1: 2.1 s per loop
14.9 GB
In [12]: %timeit -n 1 -r 1 np.ones(int(2e9), dtype=np.complex64) 1 loop, best of 1: 4.22 s per loop
29.8 GB
In [13]: %timeit -n 1 -r 1 np.ones(int(4e9), dtype=np.complex64) 1 loop, best of 1: 2min 42s per loop
Things seem to get much slower above the 16GB power of 2.
16.3GB
In [23]: %timeit -n 1 -r 1 np.ones(int(2.2e9), dtype=np.complex64) 1 loop, best of 1: 49.5 s per loop
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ska-sa/tricolour/issues/25?email_source=notifications&email_token=AB4RE6XWPM7QZBSIXP7HINDP4DPQRA5CNFSM4H24ETL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYNHYRI#issuecomment-505052229, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4RE6U6VJBYWVYG42V7DXDP4DPQRANCNFSM4H24ETLQ .
Can't reproduce.
Where can't you reproduce this? The above timings are on stevie where the problem was reported.
stevie
This is the OS default without NUMA pinning.
Are you just doing a malloc np.empty
, a zeroed malloc np.zeros
or a np.ones
?
mallocs and zero'd mallocs are fast, its filling the array where we see the slowdown.
Anyhow, now that the dead Cubical processes have been cleared out on stevie things have improved. I suspect that the Cubical shared memory model was holding onto pages and might have provoked swapping
# 7.45 GB
In [4]: %timeit -r 1 -n 1 np.ones(int(1e9), dtype=np.complex64).fill(1+0j)
1 loop, best of 1: 3.07 s per loop
# 14.9GB
In [5]: %timeit -r 1 -n 1 np.ones(int(2e9), dtype=np.complex64).fill(1+0j)
1 loop, best of 1: 6.24 s per loop
# 29.9 GB
In [6]: %timeit -r 1 -n 1 np.ones(int(4e9), dtype=np.complex64).fill(1+0j)
1 loop, best of 1: 12.2 s per loop
# 60 GB
In [7]: %timeit -r 1 -n 1 np.ones(int(8e9), dtype=np.complex64).fill(1+0j)
1 loop, best of 1: 25.3 s per loop
# 120 GB non-linear jump here
In [8]: %timeit -r 1 -n 1 np.ones(int(16e9), dtype=np.complex64).fill(1+0j)
1 loop, best of 1: 4min per loop
I made a mistake, allocated with ones and then filled with ones again so duplicated work. Anyway, the allocation and fill is faster now.
Its malloc and memset with np.zeros.
On Thu, 27 Jun 2019, 22:36 Simon Perkins, notifications@github.com wrote:
I made a mistake, allocated with ones and then filled with ones again so duplicated work. Anyway, the allocation and fill is faster now.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ska-sa/tricolour/issues/25?email_source=notifications&email_token=AB4RE6RO5W66DE2NEUKRC5LP4UQF5A5CNFSM4H24ETL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYYJSTQ#issuecomment-506501454, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4RE6Q6YWZ4ZIHPIDKC3G3P4UQF5ANCNFSM4H24ETLQ .
Currently, only a single thread allocates all memory for a window. It's pretty slow.
See further description of related issues in https://github.com/ska-sa/tricolour/issues/36
@sjperkins master now goes nowhere fast.