Features extraction - Githubissues

fabiocat93 commented 1 month ago

This is an attempt to structure audio features extraction. This would contribute to the human phenotype vector with some low-level acoustic and audio quality descriptors.

Follow-up steps include:

[x] enlarging the set of features
[x] edit the code also taking into account children's voices (so far, some functions only have different behaviors for adult males and females
[x] enable params customization, maybe as in (https://github.com/novoic/surfboard)

codecov-commenter commented 1 month ago

Codecov Report

Attention: Patch coverage is 50.85714% with 344 lines in your changes missing coverage. Please review.

Project coverage is 60.24%. Comparing base (9b7209f) to head (1bfee15). Report is 73 commits behind head on main.

Files with missing lines	Patch %	Lines
...dio/tasks/features_extraction/praat_parselmouth.py	53.51%	225 Missing :warning:
...rc/senselab/audio/tasks/features_extraction/api.py	0.00%	40 Missing :warning:
...elab/audio/tasks/features_extraction/torchaudio.py	36.53%	33 Missing :warning:
...udio/tasks/features_extraction/torchaudio_squim.py	7.14%	26 Missing :warning:
src/tests/audio/tasks/features_extraction_test.py	85.54%	12 Missing :warning:
...health_measurements/extract_health_measurements.py	0.00%	5 Missing :warning:
...selab/audio/tasks/features_extraction/opensmile.py	57.14%	3 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #175 +/- ## ========================================== - Coverage 68.32% 60.24% -8.08% ========================================== Files 95 113 +18 Lines 3283 4017 +734 ========================================== + Hits 2243 2420 +177 - Misses 1040 1597 +557 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ibevers commented 1 month ago

@fabiocat93 is this going to incorporate Nick's features? Am I remembering correctly that he gave us the okay to incorporate them?

fabiocat93 commented 1 month ago

@fabiocat93 is this going to incorporate Nick's features? Am I remembering correctly that he gave us the okay to incorporate them?

there already

ibevers commented 1 month ago

Cool😎

fabiocat93 commented 2 weeks ago

@satra Quick update: I ...

created a general api for audio feats extraction, which returns all the features we implemented using torchaudio, praat-parselmouth, torchaudio-squim and opensmile. The API follows in principle the API of surfboard, as @danielmlow suggested, but includes multiple improvements (including caching and parallelization params and library-dependent feats extraction implementations). My work should contribute to the human phenotype vector efforts (https://github.com/sensein/senselab/issues/140), which may be @wilke0818 and @900miles are still considering contributing to (?).
further restructured the code, organized functions into importable scripts, and documented them
integrated @nickcummins41's code for feats extraction with praat parselmouth. some feats were already integrated in senselab. Hence I tried to pick the best parts from the two implementations.
optimized @nickcummins41 code using pydra plus some minor adjustments, and it is now almost 4 times more performant than the original, tested on my MacBook Pro with 12 CPUs.
added a workflow that contains the general interface for health measurement extraction. @nickcummins41 I think we need a name for our feats set. This would make it easily recognizable by the community
fixed a minor bug that caused the code to crash when no pauses were detected in the speech.
added some unit tests (we may want to discuss further good ways to test feats extraction functions)
added caching (good for when you run your code a second time)

fabiocat93 commented 2 weeks ago

@nickcummins41 I’ve added your name throughout the docstrings and general documentation to acknowledge your contributions to the audio feature extraction work. If you have any suggestions for more effective ways to give credit or improvements to the documentation, please let me know!

satra commented 2 weeks ago

thanks @fabiocat93 this is great for now. let's get this out and get some feedback on usage. i'm fighting a few fires so won't have time to do an in depth review. perhaps we can do that together sometime in a month or so across the entire codebase.

fabiocat93 commented 2 weeks ago

based on the last commit, i'm curious how running in series is faster than running in parallel at scale. doesn't mean pydra has to be used, but the change off hand doesn't make sense without some explanation on bottlenecks (memory, gpu, overhead, etc.,.).

pydra is still used under the hood. and I confirm that this newer implementation runs faster than previous one.

also it looks like this focuses on returning a single feature value per feature independent of the length of the audio. i.e. some features will not make sense over some duration, and some for less than some duration. we could say that should be left up to the user, in which case an example of splitting the audio into chunks would be good.

thank you for pointing this out. I agree with that. The way senselab is designed rn, we have some functionalities at the "tasks" level offering all kinds of customization and this is left to the user. the "workflows" level is one level higher in terms of abstraction and includes (/will include) all the best practices/work arounds/heuristics that you are referring to.

(relatedly, for speech like audio, do we have some utility or example for splitting audio based on other targets (e.g. sentences, vad, etc.,.), rather than time?)

we do, in the relative sections (vad, time-stamped transcripts, ...)

satra commented 2 weeks ago

@fabiocat93 - great job here. i finally had some time to go through this. i realize there are some inconsistencies in places, and instead of trying to be perfect, fix the easy ones, let's file some of those others as issues/discussions, and let's focus on optimizations in a different PR.

i'm hoping to update b2aiprep this weekend. let me know if you think a version could be released.

sensein / senselab

Features extraction #175

Codecov Report