Rank deficiency error when loading streams

MaJoRkranz commented 2 years ago

I simultaneously recorded EEG and audio using the audio capture software and synchronized the streams in LSL. When loading the audio streams using pop_loadxdf (version: 1.18) I receive the following message: 'Warning: Rank deficient, rank = 1, tol = 4.574081e+03'. The error is caused in line 600 in load_xdf when the jitter is removed: 'mapping = temp(k).time_stamps(indices) / [ones(1,length(indices)); indices];' As a result, trigger markers, which I use to epoch the EEG and audio streams, are not correct in the audio stream. The error occurs only if an audiostream has a length of approximately more than 62.000.000 data points. I recorded with a sampling rate of 44100, thus I have roughly 23 min of data.

Here an example how to reproduce the error in MATLAB 2020b:

% this throws an error in matlab 2020b % Warning: Rank deficient, rank = 1, tol = 4.366797e+03. nrSamples =65000000; time_stamps=linspace(0,26,nrSamples); indices=1:length(time_stamps); mapping = time_stamps(indices) / [ones(1,length(indices)); indices];

%this does not, only differenc is nrSamples nrSamples =60000000; time_stamps=linspace(0,26,nrSamples); indices=1:length(time_stamps); mapping = time_stamps(indices) / [ones(1,length(indices)); indices];

I appreciate any help!

dmedine commented 2 years ago

I believe that this is actually a numerical issue that has to do with Matlab's algorithm for computing mrdivide (/ operator), but I don't know precisely why. Certainly the mapping is wrong when this occurs: mapping = 1e-3*[.4333, .4333] when nSamples <= 60000000 and mapping = 13-6*[0, .4] when nSamples >= 65000000.

This probably warrants some investigation with Mathworks, but for now, I would suggest that you could significantly reduce the sampling rate of the audio signal. If you reduce it to 10k, you already get a >4-fold increase in the length of recording you can make before encountering this issue.

VladimirR46 commented 2 years ago

This is a very actual problem! I also record sound and EEG and have this problem. I record sound at a sampling rate of 22050 but my experiment lasts about an hour. I don't want to decrease the frequency as it will degrade the sound quality.

For the test I opened the xdf file in xdfpy (Python) and everything is fine there!

Please help me to solve this problem!

cboulay commented 2 years ago

You can try reformulating and using lsqr instead of mrdivide.

Line 602, mapping = temp(k).time_stamps(indices) / [ones(1,length(indices)); indices];

might become mapping = lsqr([ones(1,length(indices)); indices], temp(k).time_stamps(indices));

I haven't tested this as I don't have easy access to Matlab. Please let us know if it works. I suspect it will be SLOW.

MaJoRkranz commented 2 years ago

Hi and thanks for your answer!

For your solution to work I had to transpose the inputs, then it worked for a large nrSamples:

mapping = lsqr([ones(1,length(indices)); indices]', time_stamps(indices)');

However, when checking the results with a smaller sample, they are not the same:

nrSamples =60000000; time_stamps=linspace(0,26,nrSamples); indices=1:length(time_stamps);

mapping1 = time_stamps(indices) / [ones(1,length(indices)); indices]; mapping2= lsqr([ones(1,length(indices)); indices]', time_stamps(indices)');

mapping1 = [-4.33333340500543e-07 4.33333340555559e-07] mapping2 = [1.08333330624999e-14 4.33333329722220e-07]

cboulay commented 2 years ago

That's not a surprise; they use different algorithms. The second value of mapping is the more important one and that is the same.

We could default to the mrdivide way and if a warning is detected then try again with the lsqr way. Here is a post on how to detect warnings.

cboulay commented 2 years ago

Actually, looking at @MaJoRkranz's result slightly closer, it seems like mapping1's offset is equivalent to -1 step, whereas mapping2's offset is equivalent to 0. I think mapping1 is correct; time_stamps starts at 0, and indices starts at 1, so the transformation from indices to time_stamps should have a -1-step offset.

Maybe it's just that lsqr didn't converge? You could try decreasing the tolerance.

More info on direct (mrdivide) vs iterative (lsqr) solutions: https://www.mathworks.com/help/matlab/math/iterative-methods-for-linear-systems.html

VladimirR46 commented 2 years ago

Hi and thanks for your answer!

I tried to use this solution.

lastwarn(''); mapping = temp(k).time_stamps(indices) / [ones(1,length(indices)); indices];

[warnMsg, ~] = lastwarn; if ~isempty(warnMsg) mapping = lsqr([ones(1,length(indices)); indices]', temp(k).time_stamps(indices)'); end

But the streams still have different time_stamps.

ThorgeHaupt commented 1 year ago

Hi,

I have tried to solve the issue using: nrSamples =65000000; time_stamps=linspace(0,26,nrSamples); indices=1:length(time_stamps); mapping_rd1 =time_stamps(indices) * pinv([ones(1,length(indices)); indices],1e-04);

where the moore-penrose inverse does not solve the issue but approximates the other time steps i.e. with pinv [-4.0625e-07 -4.0625e-07]. I am not sure whether the difference is negligible or impacts following computations, but for the data set i am currently working it results in the correct effective sampling rate of the remapped data to be computed.

xdf-modules / xdf-Matlab

Rank deficiency error when loading streams #12