mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.7k stars 1.31k forks source link

ENH: Remove redundant tests on make_flash_bem #7661

Closed yh-luo closed 4 years ago

yh-luo commented 4 years ago

Describe the problem

mne.bem.make_flash_bem was tested in both mne/commands/test_commands.py and mne/tests/test_bem.py. Since the function is time-consuming, merge them into test_flash_bem in test_commands.py would reduce testing time (~200 sec).

The flash bem surfaces in the testing dataset are different from surfaces computed by mne/commands/test_commands/test_flash_bem in that test_flash_bem uses both mef05.mgz (FLASH5) and mef30.mgz (FLASH30) to create the flash bem surfaces while the surfaces in testing data were created from only flash5.mgz (FLASH5). Is it necessary to assure the generated flash bem surfaces are the exactly same as those in the testing data?

The coordinates (x from test_flash_bem, y from testing data) are different, reported by pytest:

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0

Mismatch: 100%
Max absolute difference: 1.82610321
Max relative difference: 43.68559136
x: array([[  -1.028   ,   -0.8233  ,  115.167198],
  [  72.226601,   -0.7632  ,   67.861504],
  [  21.6248  ,   83.526398,   66.532303],...
y: array([[  -1.3635  ,   -0.8711  ,  115.845596],
  [  72.295403,   -0.5315  ,   68.657402],
  [  20.939199,   83.773201,   67.367401],...

Describe your solution

Maybe testing the shape of surfaces like in test_watershed_bem is enough?

Describe possible alternatives

agramfort commented 4 years ago

nothing is necessary as long as we have tests to ensure that we don't break things

larsoner commented 4 years ago

+1 for updating mne-testing-data with the correct surfaces and using assert_allclose with something like atol=1e-5

yh-luo commented 4 years ago

+1 for updating mne-testing-data with the correct surfaces and using assert_allclose with something like atol=1e-5

So assert_close with kwargs = dict(rtol=1e-7, atol=1e-5) passed. The generated surfaces are compared to the assumed correct surfaces (I copied the output surfaces from pytest temp folder). If only mne/tests/test_bem.py::test_make_flash_bem needs FLASH images in testing data, I think those could be removed to shrink the size of testing data. If that sounds reasonable to you I would look into that.

agramfort commented 4 years ago

@larsoner your call here.

larsoner commented 4 years ago

The files do not seem so large -- the .tri are ~200 kB each, .surf ~100 kB, so we'd only be talking about ~1 MB in a ~1 GB file. Not sure it's worth it. But if you're motivated @yh-luo we could hard-code a handful of values for a random subset of (5?) vertices and spot-check those.

yh-luo commented 4 years ago

@larsoner No I mean the T1 files from mne-testing-data #12 and #13, that's around 50MB. These files seem to exist solely for test_make_flash_bem to work?

larsoner commented 4 years ago

We definitely want to keep some tests of make_flash_bem. I thought you were proposing to remove the output files, but now it looks like you're talking about the input files like flash5.mgz so we wouldn't be able to test at all ...? Or do you mean keep just flash5.mgz and flash30.mgz, delete all the COR-* files and let those be generated ...?

yh-luo commented 4 years ago

No I'm proposing to remove the T1 COR files from mne-testing-data because test_flash_bem does not use it. test_flash_bem uses sample dataset to test while test_make_flash_bem from mne/tests/test_bem.py uses testing data, which was the reason that COR- were uploaded. The computed surfaces from these two tests do not agree with each other. Because test_flash_bem uses both mef5d.mgz and mef30d.mgz from sample dataset, while test_make_flash_bem uses only flash5.mgz in the testing dataset. I think if we merge the two tests, it would be odd to keep those COR files if they would not be used. But I'm not sure if other tests use these COR- files, so I would need to look into that.

mne/commands/tests/test_commands.py

@pytest.mark.timeout(300)  # took 200 sec locally
@pytest.mark.slowtest
@pytest.mark.ultraslowtest
@requires_freesurfer
@sample.requires_sample_data
def test_flash_bem(tmpdir):
    """Test mne flash_bem."""

mne/tests/test_bem.py

@requires_nibabel()
@requires_freesurfer('mri_convert')
@testing.requires_testing_data
def test_make_flash_bem(tmpdir):
    """Test computing bem from flash images."""
larsoner commented 4 years ago

test_flash_bem uses sample dataset to test while test_make_flash_bem from mne/tests/test_bem.py uses testing data

The ultraslowtest does not run on CIs and does not run when people do make pytest so we should not rely on it. If I had to guess, anything marked ultraslowtest probably gets run a few times a year by me and maybe a couple others by accident and that's about it...

while test_make_flash_bem uses only flash5.mgz in the testing dataset.

The solution sounds like it should be to add the the flash30.mgz file to the testing data, update the testing surfaces if necessary to be the correct ones, and then get rid of any testing files we no longer need.

BTW it's possible that the flash5.mgz file is actually a resampled (to 3mm isotropic, probably?) version of the original data to keep size down and make things process faster. I know we did this with some MRIs, not sure if we did it with that one. If we did, we should do the same with the flash30.mgz, and locally compute the "correct" surfaces for the 3mm isotropic MRIs, and either add spot-check values to the testing script, or upload the correct surfaces.

yh-luo commented 4 years ago

The solution sounds like it should be to add the the flash30.mgz file to the testing data, update the testing surfaces if necessary to be the correct ones, and then get rid of any testing files we no longer need.

I agree with this solution, would run some tests to see if that works.

BTW it's possible that the flash5.mgz file is actually a resampled (to 3mm isotropic, probably?) version of the original data to keep size down and make things process faster. I know we did this with some MRIs, not sure if we did it with that one. If we did, we should do the same with the flash30.mgz, and locally compute the "correct" surfaces for the 3mm isotropic MRIs, and either add spot-check values to the testing script, or upload the correct surfaces.

I'm not sure about flash5.mgz but would check this as well. May I just keep this issue open and gather more information?

larsoner commented 4 years ago

Yeah no rush to fix this

yh-luo commented 4 years ago

So T1.mgz in testing dataset was resampled to 3mm voxel size while T1.mgz in sample dataset was not. Because of the difference of T1.mgz and whether to use flash30.mgz, mne/commands/tests/test_commands.py::test_flash_bem and mne/tests/test_bem.py::test_make_flash_bem compute different surfaces. But as you indicated in previous comment, test_flash_bem is decorated as ultraslowtest and does not run on CIs, test_make_flash_bem on the other hand does. I'm not sure if merge the two tests would be ok... Keep both tests may be fine for now. If that's reasonable, I would close the issue.

larsoner commented 4 years ago

It's okay to merge the two tests into one that only uses the testing data with the 3mm resampled MRI. The surfaces to test against can just be ones that you create now from those testing data knowing/assuming that our current code is correct.

yh-luo commented 4 years ago

The PR (#7663) is almost ready. After using resampled T1.mgz from mne-testing-data, test_flash_bem takes ~70 sec locally (current test costs ~ 200 sec). Does test_flash_bem still counts as ultraslowtest? Or should I remove that decorator in the PR? Also, for the PR to work, mne-testing-data needs to update the flash bem surfaces as in the commit. Because test_flash_bem tests both convert_flash_mris and make_flash_bem, the COR* files in subjects/sample/mri/T1/ and subjects/sample/mri/brain/ are no longer needed. After the clean-up, the size of mne-testing-data/subjects/sample will go down to 193M (currently 216M). I'm currently running a thorough local test to make sure nothing is broken due to the changes of the testing dataset. Is it possible to run CIs on a different branch of mne-testing-data to assure nothing would be broken?

larsoner commented 4 years ago

Does test_flash_bem still counts as ultraslowtest? Or should I remove that decorator in the PR?

Yes that's still too long for something like this I think. A better option is probably to add an Azure run that only runs the tests marked ultraslowtest. But we should do that separately.

Also, for the PR to work, mne-testing-data needs to update the flash bem surfaces as in the commit.

Let's follow the procedure here:

https://github.com/mne-tools/mne-testing-data#add-or-change-files-in-the-repo-for-use-with-mne

Is it possible to run CIs on a different branch of mne-testing-data to assure nothing would be broken?

No, but I can run tests locally on your branch before merging the mne-testing-data PR. pytest mne/ works for you and me then it should give us some confidence we haven't broken any other tests. Note that pytest mne/ will run all tests including the ultraslow ones, make pytest and CIs run pytest -m "not ultraslowtest" IIRC

larsoner commented 4 years ago

No, but

Actually I take that back, you could do a lot of messing with mne/datasets/utils.py to get it to test the commit from your branch, but it's probably not worth it

yh-luo commented 4 years ago

No, but I can run tests locally on your branch before merging the mne-testing-data PR. pytest mne/ works for you and me then it should give us some confidence we haven't broken any other tests. Note that pytest mne/ will run all tests including the ultraslow ones, make pytest and CIs run pytest -m "not ultraslowtest" IIRC

Since test_bem.py kept failing, I figured that a bunch of bem stuff in testing dataset (e.g, sample-1280-1280-1280-bem.fif) would not match the model computed from the newly created surfaces (i.e., using both flash5 and flash30). Even when I passed -3 to mne flash_bem to skip 30-degree flip angle data, the computed surfaces are still different from the current surfaces in the testing dataset.

Then I found that flash5.mgz and flash5_reg.mgz are the same in sample dataset and testing dataset; however, flash5.mgz and flash5_reg.mgz converted by convert_flash_mris with mef05.mgz are different from the above and probably results in the discrepancy in the bem stuff. It's not clear to me why flash5.mgz provided by convert_flash_mris are different. There are too many steps/options in convert_flash_mris could produce different flash5.mgz.

In short, to compare the generated surfaces to the correct surfaces would cause too many changes in the mne-testing-data. I think the test would still be good enough after dropping the comparison of surfaces?

    kwargs = dict(rtol=1e-5, atol=1e-5)
    for s in ('outer_skin', 'outer_skull', 'inner_skull'):
        rr, tris = read_surface(op.join(subject_path_new, 'bem',
                                        '%s.surf' % s))
        assert_equal(len(tris), 5120)
        assert_equal(tris.min(), 0)
        assert_equal(rr.shape[0], tris.max() + 1)
        # Drop this part:
        # rr_c, tris_c = read_surface(op.join(subjects_dir, 'sample', 'bem',
        #                                     '%s.surf' % s))
        # assert_allclose(rr, rr_c, **kwargs)
        # assert_allclose(tris, tris_c, **kwargs)
larsoner commented 4 years ago

I figured that a bunch of bem stuff in testing dataset (e.g, sample-1280-1280-1280-bem.fif) would not match the model computed from the newly created surfaces (i.e., using both flash5 and flash30). Even when I passed -3 to mne flash_bem to skip 30-degree flip angle data, the computed surfaces are still different from the current surfaces in the testing dataset.

It's likely/possible that the 1280-bem files were produced using the 1mm MRI data, which is fine. We can keep them that way. We don't need everything to be created from the same set of files -- e.g., the 1280-bem's can be created using sample data and be in the same folder with inner_skull.surf files that came from the 3 mm MRIs in testing -- we just need to know and document where they all came from. Make sense?

With that in mind:

In short, to compare the generated surfaces to the correct surfaces would cause too many changes in the mne-testing-data.

I don't think this has to be the case. The question is, what minimum set of files in mne-testing-data need to be updated to make it so that we can compare surfaces with some meaningful tolerance such that, in the future if we do something wrong / break mne_flash_bem, the test will catch it?

yh-luo commented 4 years ago

Actually, not that many files are influenced by this PR, which is a good news. Aside from flash bem surfaces, only sample-320-320-320-bem.fif, sample-320-bem.fif, sample-320-320-320-bem-sol.fif and sample-320-bem-sol.fif require updates. Since the surfaces are reconstructed from both flash5 and flash30 now, I recreate these files using the new surfaces with make_bem_model and make_bem_solution (codes below). test_make_bem_model needs some modification because the testing values are a bit different, fixed in this PR.

Local full tests passed and CI results only showed the expected failure of test_make_bem_model. The testing dataset is branch test_flash. BTW I did not rebase the commits for now, hoping to clarify changes.

# 3-layer
fname_bem_3 = op.join(subjects_dir, 'sample', 'bem',
                      'sample-320-320-320-bem.fif')
fname_bem_sol_3 = op.join(subjects_dir, 'sample', 'bem',
                          'sample-320-320-320-bem-sol.fif')
fname_bem_1 = op.join(subjects_dir, 'sample', 'bem',
                      'sample-320-bem.fif')
fname_bem_sol_1 = op.join(subjects_dir, 'sample', 'bem',
                          'sample-320-bem-sol.fif')
model_3 = make_bem_model('sample', ico=2, subjects_dir=subjects_dir, verbose=True, conductivity=[0.3, 0.006, 0.3])
write_bem_surfaces(fname_bem_3, model)
solution_3 = make_bem_solution(model_3, verbose=True)
write_bem_solution(fname_bem_sol_3, solution_3)

# 1-layer
model_1 = make_bem_model('sample', ico=2, subjects_dir=subjects_dir, verbose=True, conductivity=[0.3])
write_bem_surfaces(fname_bem_1, model)
solution_1 = make_bem_solution(model_1, verbose=True)
write_bem_solution(fname_bem_sol_1, solution_1)
larsoner commented 4 years ago

Sounds reasonable to me, feel free to make a PR to mne-testing-data with the updated files and then a PR to MNE to use them

yh-luo commented 4 years ago

Thanks for the patience! I've made PRs to mne-testing-data and mne-python.

yh-luo commented 4 years ago

closed by #7663