Closed yh-luo closed 4 years ago
nothing is necessary as long as we have tests to ensure that we don't break things
+1 for updating mne-testing-data
with the correct surfaces and using assert_allclose
with something like atol=1e-5
+1 for updating
mne-testing-data
with the correct surfaces and usingassert_allclose
with something likeatol=1e-5
So assert_close
with kwargs = dict(rtol=1e-7, atol=1e-5)
passed. The generated surfaces are compared to the assumed correct surfaces (I copied the output surfaces from pytest temp folder).
If only mne/tests/test_bem.py::test_make_flash_bem
needs FLASH images in testing data, I think those could be removed to shrink the size of testing data. If that sounds reasonable to you I would look into that.
@larsoner your call here.
We definitely want to keep some tests of make_flash_bem
. I thought you were proposing to remove the output files, but now it looks like you're talking about the input files like flash5.mgz
so we wouldn't be able to test at all ...? Or do you mean keep just flash5.mgz
and flash30.mgz
, delete all the COR-*
files and let those be generated ...?
No I'm proposing to remove the T1 COR files from mne-testing-data
because test_flash_bem
does not use it.
test_flash_bem
uses sample dataset to test while test_make_flash_bem
from mne/tests/test_bem.py
uses testing data, which was the reason that COR- were uploaded.
The computed surfaces from these two tests do not agree with each other. Because test_flash_bem
uses both mef5d.mgz
and mef30d.mgz
from sample dataset, while test_make_flash_bem
uses only flash5.mgz
in the testing dataset.
I think if we merge the two tests, it would be odd to keep those COR files if they would not be used.
But I'm not sure if other tests use these COR- files, so I would need to look into that.
mne/commands/tests/test_commands.py
@pytest.mark.timeout(300) # took 200 sec locally
@pytest.mark.slowtest
@pytest.mark.ultraslowtest
@requires_freesurfer
@sample.requires_sample_data
def test_flash_bem(tmpdir):
"""Test mne flash_bem."""
mne/tests/test_bem.py
@requires_nibabel()
@requires_freesurfer('mri_convert')
@testing.requires_testing_data
def test_make_flash_bem(tmpdir):
"""Test computing bem from flash images."""
test_flash_bem uses sample dataset to test while test_make_flash_bem from mne/tests/test_bem.py uses testing data
The ultraslowtest
does not run on CIs and does not run when people do make pytest
so we should not rely on it. If I had to guess, anything marked ultraslowtest
probably gets run a few times a year by me and maybe a couple others by accident and that's about it...
while test_make_flash_bem uses only flash5.mgz in the testing dataset.
The solution sounds like it should be to add the the flash30.mgz
file to the testing data, update the testing
surfaces if necessary to be the correct ones, and then get rid of any testing
files we no longer need.
BTW it's possible that the flash5.mgz
file is actually a resampled (to 3mm isotropic, probably?) version of the original data to keep size down and make things process faster. I know we did this with some MRIs, not sure if we did it with that one. If we did, we should do the same with the flash30.mgz
, and locally compute the "correct" surfaces for the 3mm isotropic MRIs, and either add spot-check values to the testing script, or upload the correct surfaces.
The solution sounds like it should be to add the the
flash30.mgz
file to the testing data, update thetesting
surfaces if necessary to be the correct ones, and then get rid of anytesting
files we no longer need.
I agree with this solution, would run some tests to see if that works.
BTW it's possible that the
flash5.mgz
file is actually a resampled (to 3mm isotropic, probably?) version of the original data to keep size down and make things process faster. I know we did this with some MRIs, not sure if we did it with that one. If we did, we should do the same with theflash30.mgz
, and locally compute the "correct" surfaces for the 3mm isotropic MRIs, and either add spot-check values to the testing script, or upload the correct surfaces.
I'm not sure about flash5.mgz
but would check this as well. May I just keep this issue open and gather more information?
Yeah no rush to fix this
So T1.mgz
in testing dataset was resampled to 3mm voxel size while T1.mgz
in sample dataset was not.
Because of the difference of T1.mgz
and whether to use flash30.mgz
, mne/commands/tests/test_commands.py::test_flash_bem
and mne/tests/test_bem.py::test_make_flash_bem
compute different surfaces.
But as you indicated in previous comment, test_flash_bem
is decorated as ultraslowtest
and does not run on CIs, test_make_flash_bem
on the other hand does. I'm not sure if merge the two tests would be ok... Keep both tests may be fine for now.
If that's reasonable, I would close the issue.
It's okay to merge the two tests into one that only uses the testing
data with the 3mm resampled MRI. The surfaces to test against can just be ones that you create now from those testing data knowing/assuming that our current code is correct.
The PR (#7663) is almost ready. After using resampled T1.mgz
from mne-testing-data
, test_flash_bem
takes ~70 sec locally (current test costs ~ 200 sec). Does test_flash_bem
still counts as ultraslowtest
? Or should I remove that decorator in the PR?
Also, for the PR to work, mne-testing-data
needs to update the flash bem surfaces as in the commit. Because test_flash_bem
tests both convert_flash_mris
and make_flash_bem
, the COR*
files in subjects/sample/mri/T1/
and subjects/sample/mri/brain/
are no longer needed. After the clean-up, the size of mne-testing-data/subjects/sample
will go down to 193M (currently 216M).
I'm currently running a thorough local test to make sure nothing is broken due to the changes of the testing dataset. Is it possible to run CIs on a different branch of mne-testing-data
to assure nothing would be broken?
Does test_flash_bem still counts as ultraslowtest? Or should I remove that decorator in the PR?
Yes that's still too long for something like this I think. A better option is probably to add an Azure run that only runs the tests marked ultraslowtest. But we should do that separately.
Also, for the PR to work, mne-testing-data needs to update the flash bem surfaces as in the commit.
Let's follow the procedure here:
https://github.com/mne-tools/mne-testing-data#add-or-change-files-in-the-repo-for-use-with-mne
Is it possible to run CIs on a different branch of mne-testing-data to assure nothing would be broken?
No, but I can run tests locally on your branch before merging the mne-testing-data
PR. pytest mne/
works for you and me then it should give us some confidence we haven't broken any other tests. Note that pytest mne/
will run all tests including the ultraslow ones, make pytest
and CIs run pytest -m "not ultraslowtest"
IIRC
No, but
Actually I take that back, you could do a lot of messing with mne/datasets/utils.py
to get it to test the commit from your branch, but it's probably not worth it
No, but I can run tests locally on your branch before merging the
mne-testing-data
PR.pytest mne/
works for you and me then it should give us some confidence we haven't broken any other tests. Note thatpytest mne/
will run all tests including the ultraslow ones,make pytest
and CIs runpytest -m "not ultraslowtest"
IIRC
Since test_bem.py
kept failing, I figured that a bunch of bem stuff in testing dataset (e.g, sample-1280-1280-1280-bem.fif
) would not match the model computed from the newly created surfaces (i.e., using both flash5 and flash30). Even when I passed -3
to mne flash_bem
to skip 30-degree flip angle data, the computed surfaces are still different from the current surfaces in the testing dataset.
Then I found that flash5.mgz
and flash5_reg.mgz
are the same in sample
dataset and testing
dataset; however, flash5.mgz
and flash5_reg.mgz
converted by convert_flash_mris
with mef05.mgz
are different from the above and probably results in the discrepancy in the bem stuff. It's not clear to me why flash5.mgz
provided by convert_flash_mris
are different. There are too many steps/options in convert_flash_mris
could produce different flash5.mgz
.
In short, to compare the generated surfaces to the correct surfaces would cause too many changes in the mne-testing-data
. I think the test would still be good enough after dropping the comparison of surfaces?
kwargs = dict(rtol=1e-5, atol=1e-5)
for s in ('outer_skin', 'outer_skull', 'inner_skull'):
rr, tris = read_surface(op.join(subject_path_new, 'bem',
'%s.surf' % s))
assert_equal(len(tris), 5120)
assert_equal(tris.min(), 0)
assert_equal(rr.shape[0], tris.max() + 1)
# Drop this part:
# rr_c, tris_c = read_surface(op.join(subjects_dir, 'sample', 'bem',
# '%s.surf' % s))
# assert_allclose(rr, rr_c, **kwargs)
# assert_allclose(tris, tris_c, **kwargs)
I figured that a bunch of bem stuff in testing dataset (e.g, sample-1280-1280-1280-bem.fif) would not match the model computed from the newly created surfaces (i.e., using both flash5 and flash30). Even when I passed -3 to mne flash_bem to skip 30-degree flip angle data, the computed surfaces are still different from the current surfaces in the testing dataset.
It's likely/possible that the 1280-bem files were produced using the 1mm MRI data, which is fine. We can keep them that way. We don't need everything to be created from the same set of files -- e.g., the 1280-bem's can be created using sample
data and be in the same folder with inner_skull.surf
files that came from the 3 mm MRIs in testing
-- we just need to know and document where they all came from. Make sense?
With that in mind:
In short, to compare the generated surfaces to the correct surfaces would cause too many changes in the mne-testing-data.
I don't think this has to be the case. The question is, what minimum set of files in mne-testing-data
need to be updated to make it so that we can compare surfaces with some meaningful tolerance such that, in the future if we do something wrong / break mne_flash_bem
, the test will catch it?
Actually, not that many files are influenced by this PR, which is a good news.
Aside from flash bem surfaces, only sample-320-320-320-bem.fif
, sample-320-bem.fif
, sample-320-320-320-bem-sol.fif
and sample-320-bem-sol.fif
require updates.
Since the surfaces are reconstructed from both flash5 and flash30 now, I recreate these files using the new surfaces with make_bem_model
and make_bem_solution
(codes below).
test_make_bem_model
needs some modification because the testing values are a bit different, fixed in this PR.
Local full tests passed and CI results only showed the expected failure of test_make_bem_model
. The testing dataset is branch test_flash
.
BTW I did not rebase the commits for now, hoping to clarify changes.
# 3-layer
fname_bem_3 = op.join(subjects_dir, 'sample', 'bem',
'sample-320-320-320-bem.fif')
fname_bem_sol_3 = op.join(subjects_dir, 'sample', 'bem',
'sample-320-320-320-bem-sol.fif')
fname_bem_1 = op.join(subjects_dir, 'sample', 'bem',
'sample-320-bem.fif')
fname_bem_sol_1 = op.join(subjects_dir, 'sample', 'bem',
'sample-320-bem-sol.fif')
model_3 = make_bem_model('sample', ico=2, subjects_dir=subjects_dir, verbose=True, conductivity=[0.3, 0.006, 0.3])
write_bem_surfaces(fname_bem_3, model)
solution_3 = make_bem_solution(model_3, verbose=True)
write_bem_solution(fname_bem_sol_3, solution_3)
# 1-layer
model_1 = make_bem_model('sample', ico=2, subjects_dir=subjects_dir, verbose=True, conductivity=[0.3])
write_bem_surfaces(fname_bem_1, model)
solution_1 = make_bem_solution(model_1, verbose=True)
write_bem_solution(fname_bem_sol_1, solution_1)
Sounds reasonable to me, feel free to make a PR to mne-testing-data
with the updated files and then a PR to MNE to use them
Thanks for the patience! I've made PRs to mne-testing-data
and mne-python
.
closed by #7663
Describe the problem
mne.bem.make_flash_bem
was tested in bothmne/commands/test_commands.py
andmne/tests/test_bem.py
. Since the function is time-consuming, merge them intotest_flash_bem
intest_commands.py
would reduce testing time (~200 sec).The flash bem surfaces in the testing dataset are different from surfaces computed by
mne/commands/test_commands/test_flash_bem
in thattest_flash_bem
uses both mef05.mgz (FLASH5) and mef30.mgz (FLASH30) to create the flash bem surfaces while the surfaces in testing data were created from only flash5.mgz (FLASH5). Is it necessary to assure the generated flash bem surfaces are the exactly same as those in the testing data?The coordinates (
x
from test_flash_bem,y
from testing data) are different, reported by pytest:Describe your solution
Maybe testing the shape of surfaces like in
test_watershed_bem
is enough?Describe possible alternatives