Closed NickleDave closed 2 years ago
This is also an issue on Mac, I just realized :frowning_face:
So basically trying to contribute blows up in your face when you clone the repo, if you're on any system besides linux.
Time to fix.
I do think it makes sense to just distinguish them "semantically"
So the original issue was about duplicating the tests.
Renaming fixes the clobbering but we still have duplicated tests, and we're not really using the different audio file formats even, as far as I can tell
Correction: we definitely use the different audio file formats; the .phn format reports onsets and offsets in terms of sample number (i.e., index), and we convert to seconds using the sampling rate from the file
I think the 'phn'
format should have the same logic as 'birdsong-recognition-dataset'
, where we provide the option to specify a location where the .wav files can be found, but default to None, and in the case of that default, we look in the parent of the .phn path, i.e. we assume the file has the same name but with .ext .wav or .WAV instead of .phn
There's an extra step here because we have to check for either .wav or .WAV, but overall it's the same idea
Will make a separate issue for this
I can remove duplication of the tests by globbing all the PHN files together, and letting the annotation format itself check for the .wav as I just described in the previous comment
This is best done by having lists of paths declared outside of fixtures so that the globbing can happen programatically.
This makes it possible later to assign subsets of the constants to parametrized fixtures, e.g. if I only want to check 'wav-nist'
format PHNs.
I'm already in the middle of refactoring fixtures this way in the version-4.0 branch.
So I'm going to leave this open but make it an issue for that branch
Fixed in #161
right now the tests and the data for the 'phn' format from the TIMIT dataset are duplicated
there's a good reason for the data to be duplicated: the .wav files from one version use a NIST format that can be parsed by
soundfile
but not byscipy.wavefile
, as described in this README in the test data: https://github.com/NickleDave/crowsetta/blob/main/tests/data_for_tests/audio_WAV_annot_PHN/README.mdso we want to make sure we can parse both
but there's not a very good reason to duplicate the unit tests, literally the only thing that's different is the fixture we pass in
this is one of those cases where it would be convenient to be able to use fixtures to parametrize tests
there's a workaround here: https://docs.pytest.org/en/6.2.x/proposals/parametrize_with_fixtures.html but I have read this multiple times I am in this situation and still am not sure whether it actually works
and I end up just creating a factory function that accepts e.g. the extension as an argument (e.g.,
list_of_phns(format='PHN')
) which works fine