irfu / irfu-matlab

Matlab routines to work with space data, particularly with MMS and Cluster/CAA data. Also some general plasma routines.
59 stars 46 forks source link

solo.read_TNR()/SolO QLI: Sometimes presumed incorrect return data #138

Open ErikPGJ opened 6 months ago

ErikPGJ commented 6 months ago

Step 1: Latest code?

Step 2: Describe your environment

Step 3: Describe the problem

solo.read_TNR() for commit '67472823 Erik P G Johansson (2024-01-18 14:43:31 +0100) BICAS: Cleanup (2)' appears to work as intended for 2023-01-27 (reproduces (essentially) the same spectrogram in regenerated quicklooks as in much older quicklooks).

solo.read_TNR() in most recent local devel commit (27365468 Erik P G Johansson (2024-03-27 14:06:52 +0100) SolO QLI: Cleanup (mostly tests) (not yet pushed) does not return the same data as before recent refactoring.

Note: The recent refactoring (f1f73cb8 Erik P G Johansson (2024-03-22 11:34:59 +0100) SolO QLI, solo.read_TNR: Use SolO DB, time interval length, no exc.) made the function use SolO DB instead of locating and reading CDFs directly (use solo.db_get_ts() (and solo.qli.utils.read_constant_metadata())).

Observations in general when plotting spectra (irf_spectrogram()) after refactoring:

Unclear if this is due to changes in underlying data or in code.

Steps to reproduce:

Ex: SolO RPW TNR data (solo_L2_rpw-tnr-surv) for 2023-01-27, but comparisons of old and new quicklooks implies that the same problem exists for many other dates too.

Relevant code:

The problem should be contained entirely in the call to solo.read_TNR() (including code indirectly called by beforementioned function).

Note: solo.read_TNR() is used by QLI (IRFU Quicklooks), and the results are displayed in panel 10 in 24h/6h/2h quicklooks and panel 9 for 7-day quicklooks. The refactored (suspicious/buggy) version is not yet used for homepage as of 2024-03-28.

FYI, @JordiBoldu

ErikPGJ commented 6 months ago

Panel 10 in QLI, 24h, panel 10, 2023-01-27:

Most recent commit 27365468 Erik P G Johansson (2024-03-27 14:06:52 +0100) (HEAD -> devel) SolO QLI: Cleanup (mostly tests): image

Note:

Most recent commit 27365468 Erik P G Johansson (2024-03-27 14:06:52 +0100) (HEAD -> devel) SolO QLI: Cleanup (mostly tests), but using read_TNR() from before refactoring 67472823: 27365468 Erik P G Johansson (2024-03-27 14:06:52 +0100) (HEAD -> devel) SolO QLI: Cleanup (mostly tests)

image

Quicklook generated by old code running for homepage (code version ca 2023-04?), generated 2024-02-19 (i.e. using recent data): image

ErikPGJ commented 6 months ago

Inspecting and comparing the execution of different versions of solo.read_TNR():

67472823:

solo.read_TNR_67472823(irf.tint('2023-01-27T00:00:00Z/2023-01-28T00:00:00Z'))

Uses dataobj() for all zVariables. path = '/data/solo/remote/data/L2/thr/2023/01/solo_L2_rpw-tnr-surv-cdag_20230127_V*.cdf'

Reports 172075 CDF records for all time-dependent zVariables.

K>> data_l2.auto1.time
ans = 
  EpochTT object : 172075 records

  2023-01-27T00:02:10.051494528Z
  2023-01-27T00:02:10.649180928Z
  ... skipped 172071 records ...
  2023-01-27T23:59:59.511023488Z
  2023-01-27T23:59:59.977048576Z

27365468 Erik P G Johansson (2024-03-27 14:06:52 +0100) (HEAD -> devel) SolO QLI: Cleanup (mostly tests)

solo.read_TNR(irf.tint('2023-01-27T00:00:00Z/2023-01-28T00:00:00Z'))

Uses solo.db_get_ts('solo_L2_rpw-tnr-surv-cdag', zvName, tint); for zVariables, except for solo.qli.utils.read_constant_metadata('solo_L2_rpw-tnr-surv-cdag', 'TNR_BAND_FREQ', tint); which is not time-dependent.

Reports 170961 CDF records for all time-dependent zVariables.

K>> data_l2.auto1.time
ans = 
  EpochTT object : 170961 records

  2023-01-27T00:00:00.167610496Z
  2023-01-27T00:00:00.663953280Z
  ... skipped 170957 records ...
  2023-01-27T23:59:59.511023488Z
  2023-01-27T23:59:59.977048576Z

Note: Longer time interval with fewer CDF records! Note: 172075-170961 CDF records = 1114

ErikPGJ commented 6 months ago

cdfdump on what should be the relevant file /data/solo/remote/data/L2/thr/2023/01/solo_L2_rpw-tnr-surv-cdag_20230127_V10.cdf

gives for Epoch:

  Record # 1: 2023-01-27T00:02:10.051494528    # Fits with commit 67472823.
  Record # 2: 2023-01-27T00:02:10.649180928
...
  Record # 172335: 2023-01-28T00:02:09.985619712
  Record # 172336: 2023-01-28T00:02:10.423158144   # Does not fit with commit 67472823, but that is explainable.

i.e. 172336 CDF records, which is different compared to both of above. but that is OK since commit 67472823 truncates data to only be within tint when using get_ts(data_obj, key_upper).tlim(tint);. Can see that commit 67472823 reads 172336 CDF records inside data_obj, before truncation, i.e. it is consistent with cdfdump.

ErikPGJ commented 6 months ago

27365468 Erik P G Johansson (2024-03-27 14:06:52 +0100) SolO QLI: Cleanup (mostly tests)

solo.db_get_ts('solo_L2_rpw-tnr-surv-cdag', zvName, tint); loads below files:

/data/solo/remote/data/L2/thr/2023/01/solo_L2_rpw-tnr-surv-cdag_20230126_V05.cdf
/data/solo/remote/data/L2/thr/2023/01/solo_L2_rpw-tnr-surv-cdag_20230127_V10.cdf

solo_L2_rpw-tnr-surv-cdag_20230126_V05.cdf has below records in the day after, i.e. 2023-01-27:

  Record # 172076: 2023-01-27T00:00:00.167610496
  Record # 172077: 2023-01-27T00:00:00.663953280
...
  Record # 172335: 2023-01-27T00:02:09.243032832
  Record # 172336: 2023-01-27T00:02:09.724941568

==> 261 CDF records <> 172075-170961 = 1114 CDF record difference above!

It is is consistent with above if looking at timestamps.

Old code 67472823 reads exactly one day file which is approximately midnight to midnight. 27365468 Erik P G Johansson (2024-03-27 14:06:52 +0100) SolO QLI: Cleanup (mostly tests) uses the exact time period (midnight to midnight), by using multiple (two) CDFs but that appears to not be the entire explanation.

ErikPGJ commented 6 months ago

It seems that solo.db_get_ts() can remove samples for identical timestamps (DEPEND_0) whereas dataobj() does not.

solo.db_get_ts() log message and implementation hints if this: [warning: solo_db.get_variable/append_sci_var(167)] Discarded 2508 data points once for every zVariable.

It is still strange why the remainder of the read_TNR() is so sensitive to this.

ErikPGJ commented 6 months ago

Disabling (commenting out) below section in solo_db.m which removes samples for duplicated timestamps does not seem to help, so removal of samples should not be the (entire) problem.

        % check for overlapping time records
        % ----------------------------------
        % idxUnique = (Presumably) index to the FIRST instance of a unique
        %             value.
%         [~,idxUnique] = unique(res.DEPEND_0.data);
%         idxDuplicate = setdiff(1:length(res.DEPEND_0.data), idxUnique);
%         % Remove CDF records for which the dependent (DEPEND_0) zVariable
%         % (typically "Epoch") is identical to earlier instance.
%         res.data(idxDuplicate, :, :, :, :, :, :, :, :, :, :, :) = [];
%         for idep = 0:n_dep
%           DEP_str = ['DEPEND_' num2str(idep)];
%           if v.(DEP_str).nrec == v.nrec
%             res.(DEP_str).data(idxDuplicate, :, :, :, :, :, :, :, :, :, :, :) = [];
%           end
%         end
%         nDuplicate = length(idxDuplicate);
%         if nDuplicate
%           irf.log('warning',sprintf('Discarded %d data points',nDuplicate))
%         end

image

ErikPGJ commented 6 months ago

Having read_TNR() read all zVariables through dataobj()+solo.db_list_files() and truncating time interval properly seems to not help quicklooks.

Is the non-CDF-reading processing in solo.read_TNR() very sensitive to adding data at the beginning?!

Try artificially truncating data at beginning of day when using solo.get_ts()?

ErikPGJ commented 6 months ago

Artificially constraining the time interval to the old effective interval (stemming from one day-CDF, but also truncate at midnight), but using solo.db.get_ts() yields the same number of CDF records and quicklooks implies that this yields effectively the same data as before refactoring.

Note: solo_db.get_variable/append_sci_var does not discard any data points for this time interval.

Tint2 = irf.tint('2023-01-27T00:02:10.051494528Z/2023-01-28T00:00:00Z');
Tnr = solo.read_TNR(Tint2);
Data.Tnr = Tnr;
K>> data_l2.auto1.time
ans = 
  EpochTT object : 172075 records

  2023-01-27T00:02:10.051494528Z
  2023-01-27T00:02:10.649180928Z
  ... skipped 172071 records ...
  2023-01-27T23:59:59.511023488Z
  2023-01-27T23:59:59.977048576Z

image

ErikPGJ commented 6 months ago

And to confirm, they are completely identical as one may suspect.

TintExactDay       = irf.tint('2023-01-27T00:00:00.000000000Z/2023-01-28T00:00:00Z');
TintPrerefactoring = irf.tint('2023-01-27T00:02:10.051494528Z/2023-01-28T00:00:00Z');
Tnr2 = solo.read_TNR(         TintPrerefactoring);
Tnr1 = solo.read_TNR_67472823(TintExactDay);
assert(isequal(Tnr1, Tnr2))

Thus, the additional data from the previous CDF (the first ~two minutes of data spilling over from the previous day file) must be the problem.

ErikPGJ commented 6 months ago

FYI, @JordiBoldu , there seems to be a bug in solo.read_TNR() which would be corrected by below code. Does that seem plausible? Not sure if this related to this issue yet though. Still researching.

% NOTE: zVariable SWEEP_NUM is CDF_UINT32 which is converted into MATLAB class
% uint32. uint32 can not represent negative values and MATLAB (apparently) does
% not wrap values it can not represent, so MATLAB converts negative values into
% zero instead.
% Therefore, 
%   diff(uint32([0,3,0])) == [3,  0]
%   diff(double([0,3,0])) == [3, -3]
% Must therefor use double.
sweep_ = double(sweep_);    % Bugfix?
delta_sw = abs(sweep_(2:end) - sweep_(1:end - 1));

It makes a difference for 2023-01-27 data where only the last jump of three in sweep_ is actually recognized in xdelta_sw.

delta_sw = abs(sweep_(2:end) - sweep_(1:end - 1));
xdelta_sw = find(delta_sw > 100)';
ErikPGJ commented 6 months ago

@JordiBoldu Applying above bugfix for most recent version of solo.read_TRN() yields below figure which superficially looks like an improvement. The vertical white stripes (holes?) are new and unknown:

image

ErikPGJ commented 6 months ago

Footnote: Previously mentioned local commit 27365468 Erik P G Johansson (2024-03-27 14:06:52 +0100) SolO QLI: Cleanup (mostly tests) now gone, but corresponds to pushed commit a5640027 Erik P G Johansson (2024-03-28 13:29:54 +0100) SolO QLI: Cleanup (mostly tests) except for a completely unrelated bugfix.

ErikPGJ commented 6 months ago

Comparing "larger" number of quicklooks seems to be an overall improvement(?) with above bugfix, but there is still some strange differences compared to before bugfix, in particular white rectangular areas in spectrums, often as thin vertical slices, always(?) having the same upper and lower boundaries.

Example: 2023-01-03 fb454934_2023-05-12 image

8422e6e5 Erik P G Johansson (2024-03-28 13:32:40 +0100) (HEAD -> read_TNR_bugfix2) solo.read_TNR()/QLI: Bugfix: double(diff) = Effectively 69c5ddb0 Erik P G Johansson (2024-03-28 13:29:54 +0100 + bugfix. image

Examples: 24 h QLIs 2023-01-04 -- 2023-01-20 (stopped comparing for later dates)

ErikPGJ commented 6 months ago

Pushed the (incomplete) bugfix in 40f49cf0 Erik P G Johansson (2024-04-03 13:28:08 +0200) (origin/devel) solo.read_TNR()/QLI: Bugfix: double(diff).

ErikPGJ commented 5 months ago

Note that the plotting code contains odd size checks on Data.Tnr which appears to be a way of checking whether the variable sizes in the struct are consistent with each other. This hints of bugs in solo.read_TNR() (or at least if the return value is meant to be passed on to irf_spectrogram()).

Could potentially be used as a way of searching for data which triggers bugs.

generate_quicklooks_24h_6h_2h.m

    sz_tnr = size(Data.Tnr.p);
    % NOTE: Condition is a way of determining whether Data.Tnr contains
    %       any real data, despite not being empty.
    if sz_tnr(1) == length(Data.Tnr.t) && sz_tnr(2) == length(Data.Tnr.f)
      irf_spectrogram(h(10), Data.Tnr, 'log', 'donotfitcolorbarlabel')

generate_quicklook_7days.m:

    sz_tnr = size(Data.Tnr.p);
    if sz_tnr(1) == length(Data.Tnr.t) && sz_tnr(2) == length(Data.Tnr.f)
      irf_spectrogram(h(9), Data.Tnr, 'log', 'donotfitcolorbarlabel')
      hold(           h(9), 'on');
    end
JordiBoldu commented 5 months ago

There is an issue with the data size after read_TNR does the merging of the 2 sensor channels. It is not clear where the issue comes from, but it seems to be due to different sensor configurations not being compatible.

ErikPGJ commented 4 months ago

In hindsight, the issue description is misleading since (most likely) all of this is due to pre-existing bug(s) in the solo.read_TNR(), not bugs created in the refactoring.

Footnote: Homepage currently uses version 02274b3e0 Erik P G Johansson (2024-04-15 18:50:09 +0200).

ErikPGJ commented 4 months ago

Footnote: The homepage currently uses version 4ac775751 Erik P G Johansson (2024-05-16 18:09:00 +0200) (HEAD -> nas) solo.qli: Log msg. for scanned log file.