solo.read_TNR()/SolO QLI: Function is slow, and speed scales badly with amount of data

ErikPGJ commented 5 months ago

Step 1: Latest code?

[ X] This issue is present in the most recent code (i.e. after running git pull this is still a problem). Should be present in latest commit on devel branch: b679b71a Erik P G Johansson (2024-04-08 12:46:41 +0200)

Step 2: Describe your environment

irfu-matlab branch (e.g. master, devel): devel
Matlab version used (e.g. Matlab R2020b): MATLAB R2023b
Operating system (e.g. Windows 10, Mac OS X 10.15.6, Linux Ubuntu 20.04): Ubuntu 22.04.4 LTS
[X ] A 64-bit installation of Matlab was used.

Step 3: Describe the problem

solo.read_TNR() appears to have performance issues. Wall time spent in function increases faster than linearly with length of time interval of data used for it (the argument).

Steps to reproduce:

Simple code for visualizing behaviour.

%
% NOTE: Must mount /data/solo/ first. SolO DB fails without error if directories
% are not available.
%
% NOTE: May have to account for caching of (mounted/non-mounted) CDF files. SolO
% DB also has its own caching(?).
%
function read_TNR___speedTest
  tic
  close all

  solo.db_init('local_file_db', '/data/solo/');
  solo.db_init('db_cache_size_max', 4096)   % Unit: MiB.
  solo.db_cache('on', 'save')

%  N_SEC_ARRAY = 86400 * logspace(log10(0.1), log10(1), 5);
  N_SEC_ARRAY = 86400 * [0.2, 0.5, 1, 2, 3];

  for i = 1:numel(N_SEC_ARRAY)
    nSec         = N_SEC_ARRAY(i);
    tWallTimeSec = measure_read_TNR_walltime(nSec);

    fprintf('i            = %i\n', i)
    fprintf('nSec         = %g\n', nSec)
    fprintf('tWallTimeSec = %g\n', tWallTimeSec)

    nSecArray(i)         = nSec;
    tWallTimeSecArray(i) = tWallTimeSec;
  end

  plot(nSecArray/86400, tWallTimeSecArray, 'o-')
  xlabel('Length of time interval with TNR data [days]')
  ylabel('Wall time [s]')
  grid on

  toc
end

function tWallTimeSec = measure_read_TNR_walltime(nSec)
  START_TIME = irf.time_array('2022-12-28T00:00:00.00Z');
  STOP_TIME  = START_TIME + nSec;

  READ_TNR_TINT = [START_TIME; STOP_TIME];

  t = tic();
  TnrData = solo.read_TNR(READ_TNR_TINT);
  tWallTimeSec = toc(t);
end

ErikPGJ commented 5 months ago

Problem should exist in below loop:

for ind_sweep = min_sweep:max_sweep
  v1_ = zeros(128, 1);
  p_punt = find(sweep_num == ind_sweep);
  if ~isempty(p_punt)
    for indband = 1:numel(p_punt)
      idx_l = floor(32 * bande_e(p_punt(indband))) + 1;
      idx_r = floor(32 * (bande_e(p_punt(indband)) + 1));
      v1_(idx_l:idx_r) = auto_calib(p_punt(indband), :);
    end
  end

  if sum(v1_) > 0.0    % Might be a way of detecting CDF fill values (large, negative).
    punt0_ = find(v1_ == 0.0);
    if ~isempty(punt0_)
      v1_(punt0_) = NaN;
    end
    v_ = [v_, v1_];
    sweepn_tnr = [sweep_tnr, sweep_num(p_punt(1))];
  end

  if ~isempty(p_punt)
    time_ = [time_, time_rr(min(p_punt))];
  end
end

In particular, the incrementally growing arrays sweepn_tnr and time_ look suspicious.

ErikPGJ commented 5 months ago

Previous informal test of above loop:

1 day    13 s
2 days   45 s
3 days  121 s

Or maybe it does not scale badly after all?

ErikPGJ commented 5 months ago

Note: Underlying dataset used for testing have very similar sizes.

/data/solo/remote/data/L2/thr> ll -h 2022/12/solo_L2_rpw-tnr-surv-cdag_202212[23]*  2023/01/solo_L2_rpw-tnr-surv-cdag_2023010* | cut -b31-
 326M 2023-11-13 14.57:52 2022/12/solo_L2_rpw-tnr-surv-cdag_20221220_V07.cdf
 305M 2023-11-13 15.02:40 2022/12/solo_L2_rpw-tnr-surv-cdag_20221221_V10.cdf
 328M 2023-11-13 15.07:52 2022/12/solo_L2_rpw-tnr-surv-cdag_20221222_V06.cdf
 288M 2023-11-13 15.12:23 2022/12/solo_L2_rpw-tnr-surv-cdag_20221223_V09.cdf
 328M 2023-11-13 15.17:32 2022/12/solo_L2_rpw-tnr-surv-cdag_20221224_V07.cdf
 328M 2023-11-13 15.22:42 2022/12/solo_L2_rpw-tnr-surv-cdag_20221225_V06.cdf
 327M 2023-11-13 15.27:54 2022/12/solo_L2_rpw-tnr-surv-cdag_20221226_V08.cdf
 326M 2023-11-13 15.32:59 2022/12/solo_L2_rpw-tnr-surv-cdag_20221227_V07.cdf
 327M 2023-11-13 15.38:15 2022/12/solo_L2_rpw-tnr-surv-cdag_20221228_V08.cdf
 328M 2023-11-13 15.43:34 2022/12/solo_L2_rpw-tnr-surv-cdag_20221229_V07.cdf
 328M 2023-11-13 15.48:49 2022/12/solo_L2_rpw-tnr-surv-cdag_20221230_V09.cdf
 328M 2023-11-13 15.53:58 2022/12/solo_L2_rpw-tnr-surv-cdag_20221231_V09.cdf
 328M 2023-11-13 11.44:33 2023/01/solo_L2_rpw-tnr-surv-cdag_20230101_V08.cdf
 327M 2023-11-13 11.49:48 2023/01/solo_L2_rpw-tnr-surv-cdag_20230102_V08.cdf
 326M 2023-11-13 11.54:59 2023/01/solo_L2_rpw-tnr-surv-cdag_20230103_V08.cdf
 220M 2023-11-13 11.58:33 2023/01/solo_L2_rpw-tnr-surv-cdag_20230104_V07.cdf
 328M 2023-11-13 12.03:49 2023/01/solo_L2_rpw-tnr-surv-cdag_20230105_V06.cdf
 328M 2023-11-13 12.09:02 2023/01/solo_L2_rpw-tnr-surv-cdag_20230106_V06.cdf
 328M 2023-11-13 12.14:22 2023/01/solo_L2_rpw-tnr-surv-cdag_20230107_V05.cdf
 328M 2023-11-13 12.19:36 2023/01/solo_L2_rpw-tnr-surv-cdag_20230108_V05.cdf
 277M 2023-11-13 12.24:08 2023/01/solo_L2_rpw-tnr-surv-cdag_20230109_V06.cdf

ErikPGJ commented 5 months ago

Profiling implies that the problem is something completely different:

profile on; solo.read_TNR(irf.tint('2022-12-28T00:00:00Z/2022-12-29T00:00:00Z')); profile off; profile viewer

ErikPGJ commented 5 months ago

Indeed, the command p_punt = find(sweep_num == ind_sweep); does seem to take most of the time, even if commenting out everything else in the loop.

ErikPGJ commented 5 months ago

From analyzing the code, it seems possible to speed up the code somewhat (factor of ~2?) by making below replacement:

%for ind_sweep = min_sweep:max_sweep
for ind_sweep = unique(sweep_num)'   % NOTE: "sweep_num = sweep_num(sens_)" above.

(Not tested if the resulting data is the same though.)

ErikPGJ commented 5 months ago

Speculative idea: Could maybe parallelize the loop (parfor)?

ErikPGJ commented 5 months ago

Seems one could simplify (not speed up)

punt0_ = find(v1_ == 0.0);
if ~isempty(punt0_)
  v1_(punt0_) = NaN;
end

to

v_1(v1_==0) = NaN;

.

ErikPGJ commented 5 months ago

Seems one could almost simplify iteration (not speed up) to use only one if statement since isempty(p_punt) ==> ~(sum(v1_) > 0.0).

Footnote: Not implementing these ideas myself since (1) I do not want to interfere with Jordi's work on read_TNR(), and (2) because I do not have time to actually check if it works well myself right now. Just collecting ideas.

JordiBoldu commented 5 months ago

Changing

punt0 = find(v1 == 0.0); if ~isempty(punt0) v1(punt0_) = NaN; end

to

v1(v1==0) = NaN;

should work. The code was translated 'literally' from IDL so there are some implementations that are not optimized for matlab.

ErikPGJ commented 5 months ago

Yes, now when I looked more carefully for it, that sounds it could be a "bad translation".

I did not really intend to analyze read_TNR() for bugs (since that is more for https://github.com/irfu/irfu-matlab/issues/138), just tried to understand what the loop does in my head. I also note that sweep_tnr and sweepn_tnr are not used for anything. Is it both a typo (should be one and same variable?) and variable(s) that should have been removed?

irfu / irfu-matlab