unit test for evd - Githubissues

hfattahi commented 2 years ago

This PR addresses https://github.com/isce-framework/fringe/issues/63 and adds a unit test to test the evd module. The unit test simulates a known signal (wrapped phase series), simulates a coherence matrix and simulates a neighborhood of SLC signal. After storing the simulated SLCs with same format of isce2 to disk with the same directory structure that FRINGE expects, different fringe steps is called to estimate the neighborhood and wrapped phase series with EVD and MLE methods.

A threshold of 10 degrees is used to check RMSE of the residual estimated wrapped phase series, where residual is the difference between the simulated and estimated phase series. Since the simulated coherence is very high, I would have expected that a smaller threshold would be needed. However, with my limited test, currently this threshold seems to be needed.

For debugging purpose a plotting flag at the end of the unit test exists to plot the simulated and estimated wrapped phase series which is turned off by default.

For debugging purpose a simple python function with EVD estimator is included which seems to consistently lead to smaller RMSE than FRINGE's EVD and MLE estimators.

rtburns-jpl commented 2 years ago

Unit test didn't meet tolerance on CI:

rmse for evd [degrees]: 0.40012108110274364
rmse for evd fringe [degrees]: 4.244824356099382
rmse for mle fringe [degrees]: 70.30201192964489
Traceback (most recent call last):
  File "/home/ubuntu/fringe/tests/evd/test_evd.py", line 440, in <module>
    main()
  File "/home/ubuntu/fringe/tests/evd/test_evd.py", line 392, in main
    assert rmse_fringe_mle <= 10
AssertionError

I ran it locally on my mac and the rmse's were worse:

rmse for evd [degrees]: 0.6540348038068458
rmse for evd fringe [degrees]: 74.10482726333565
rmse for mle fringe [degrees]: 74.10482726333565

hfattahi commented 2 years ago

Thanks @rtburns-jpl for reporting your local test. Yes I noticed the instability of the estimator as well. But it is more than what I noticed locally. Does it always fail for you with such large rmse or what you posted is one of the worst cases you encounter?

rtburns-jpl commented 2 years ago

Yeah, I get those results consistently. I can also try with different python/numpy versions or on linux if you think that might affect the results.

hfattahi commented 2 years ago

Thanks @rtburns-jpl for confirming. On my local test, the test passes sometime for fringe's estimator EVD and most of the times fails for MLE. The RMSE for my python prototype estimator is always small. I think all this points to the fact that within fringe evd we have unnecessary checks which needs to be relaxed. I had observed this issue before with real data as well. Specifically the problem is the way that we check for covariance and coherence positive definiteness. Some of those checks are not needed. I will open an issue documenting this bug, which we have a solution for.

hfattahi commented 2 years ago

Good news @rtburns-jpl ! I had one minor bug with significant impact! Basically the intent of the test was to simulate a homogeneous neighborhood. While I had 121 pixels in the enighborhood, I was basically using a very few pixels or even sometimes 1-2 pixels for the estimation which was making the estimation fail or inaccurate. By enforcing the homogeneity all the pixels are used, and the test now pass and the rmse of fringe estimators are comparable to the expected values from python prototype (which was using all the pixels even before fixing this bug).

rmse for evd [degrees]: 0.6745330650068299
rmse for evd fringe [degrees]: 0.6745330982173353
rmse for mle fringe [degrees]: 0.8456865462880113

My comment above about some of the unnecessary checks we have in Fringe still holds it does not impact this PR anymore.

rtburns-jpl commented 2 years ago

Very strange, I was still failing to meet error tolerance on macos and linux after your last bugfix. I noticed that the CI is using ordinary blas, but I was using openblas, so I switched over and the test is passing. Any idea why openblas would be giving me worse precision here?

hfattahi commented 2 years ago

Very strange, I was still failing to meet error tolerance on macos and linux after your last bugfix. I noticed that the CI is using ordinary blas, but I was using openblas, so I switched over and the test is passing. Any idea why openblas would be giving me worse precision here?

I'm not sure why that would happen! I'm wondering if the precision is low or the estimation completely failed! Usually when you see very large residuals, it means that eigne value decomposition for EVD or matrix inversion for MLE has failed.

Would this simple test run for you in both cases with identical results? (Note that you need to compile manually.)

A plot of the results would be also helpful. Would it be possible to turn on this debug flag and post the second plot that it creates?

Here is a plot I get when I turn the debug flag on where I'm using a conda environment similar to the CI setup:

rtburns-jpl commented 2 years ago

With reference blas implementation:

Testing known 3 x 3 
3 -2 4
-2 8 2
4 2 3
Eigen values: 
-1.81507
7
8.81507
Eigen vectors: 
-0.679457 0.707107 0.1958
-0.276904 -5.55112e-16 -0.960898
0.679457 0.707107 -0.1958
Input array: 
(3,0) (-2,0) (4,0)
(-2,0) (8,0) (2,0)
(4,0) (2,0) (3,0)
Order = 3
Largest eigen value: 8.81507
(0.1958,0)
(-0.960898,0)
(-0.1958,0)
Order  3
Smallest eigen value: -1.81507
(-0.679457,0)
(-0.276904,0)
(0.679457,0)
Matrix inverse: 
(2.55556,0) (1,0) (-2.44444,0)
(1,-0) (0.5,0) (-1,0)
(-2.44444,-0) (-1,-0) (2.55556,-0)
Lapack inverse: 0
(2.55556,0) (1,-0) (-2.44444,-0)
(1,0) (0.5,0) (-1,-0)
(-2.44444,0) (-1,0) (2.55556,-0)
(1,0) (0,0) (0,0)
(2.66454e-15,0) (1,0) (-3.55271e-15,0)
(4.44089e-16,0) (-2.22045e-16,0) (1,0)

With openblas:

Testing known 3 x 3 
3 -2 4
-2 8 2
4 2 3
Eigen values: 
-1.81507
7
8.81507
Eigen vectors: 
-0.679457 0.707107 0.1958
-0.276904 -5.55112e-16 -0.960898
0.679457 0.707107 -0.1958
Input array: 
(3,0) (-2,0) (4,0)
(-2,0) (8,0) (2,0)
(4,0) (2,0) (3,0)
Order = 3
 ** On entry to ZHEEVR parameter number  6 had an illegal value
Largest eigen value: 2
(0,0)
(0,0)
(0,0)
Order  3
 ** On entry to ZHEEVR parameter number  6 had an illegal value
Smallest eigen value: 2
(0,0)
(0,0)
(0,0)
Matrix inverse: 
(2.55556,0) (1,0) (-2.44444,0)
(1,-0) (0.5,0) (-1,0)
(-2.44444,-0) (-1,-0) (2.55556,-0)
 ** On entry to ZPOTRF parameter number  4 had an illegal value
fish: Job 1, './eigen_test' terminated by signal SIGABRT (Abort)

Clearly something is going very wrong with armadillo/openblas, I'll just assume I shouldn't use openblas for now.

piyushrpt commented 2 years ago

Look at the README and openmp section

hfattahi commented 2 years ago

Look at the README and openmp section

To be clear I think @piyushrpt refers to this

rtburns-jpl commented 2 years ago

Hmm, yeah I tried OPENBLAS_NUM_THREADS=1 but it doesn't seem to change anything.

piyushrpt commented 2 years ago

Is this on your M1 Mac? What is the architecture?

rtburns-jpl commented 2 years ago

This is on macos x86_64, but I get the same thing on linux x86_64 when using openblas. I'll go ahead and open a separate issue so this PR can get back on track.

hfattahi commented 2 years ago

LGTM

My comments are mostly style related and midway through reviewing I stopped repeating myself. AFAICT none of my comments affect functionality.

Thank you @LiangJYu for the useful comments and suggestions. I addressed most of them. There are style improvements left for the future PRs.

isce-framework / fringe

unit test for evd #65