Closed fabiocat93 closed 2 weeks ago
Attention: Patch coverage is 50.85714%
with 344 lines
in your changes missing coverage. Please review.
Project coverage is 60.24%. Comparing base (
9b7209f
) to head (1bfee15
). Report is 73 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@fabiocat93 is this going to incorporate Nick's features? Am I remembering correctly that he gave us the okay to incorporate them?
@fabiocat93 is this going to incorporate Nick's features? Am I remembering correctly that he gave us the okay to incorporate them?
there already
Cool😎
@satra Quick update: I ...
@nickcummins41 I’ve added your name throughout the docstrings and general documentation to acknowledge your contributions to the audio feature extraction work. If you have any suggestions for more effective ways to give credit or improvements to the documentation, please let me know!
thanks @fabiocat93 this is great for now. let's get this out and get some feedback on usage. i'm fighting a few fires so won't have time to do an in depth review. perhaps we can do that together sometime in a month or so across the entire codebase.
based on the last commit, i'm curious how running in series is faster than running in parallel at scale. doesn't mean pydra has to be used, but the change off hand doesn't make sense without some explanation on bottlenecks (memory, gpu, overhead, etc.,.).
pydra is still used under the hood. and I confirm that this newer implementation runs faster than previous one.
also it looks like this focuses on returning a single feature value per feature independent of the length of the audio. i.e. some features will not make sense over some duration, and some for less than some duration. we could say that should be left up to the user, in which case an example of splitting the audio into chunks would be good.
thank you for pointing this out. I agree with that. The way senselab is designed rn, we have some functionalities at the "tasks" level offering all kinds of customization and this is left to the user. the "workflows" level is one level higher in terms of abstraction and includes (/will include) all the best practices/work arounds/heuristics that you are referring to.
(relatedly, for speech like audio, do we have some utility or example for splitting audio based on other targets (e.g. sentences, vad, etc.,.), rather than time?)
we do, in the relative sections (vad, time-stamped transcripts, ...)
@fabiocat93 - great job here. i finally had some time to go through this. i realize there are some inconsistencies in places, and instead of trying to be perfect, fix the easy ones, let's file some of those others as issues/discussions, and let's focus on optimizations in a different PR.
i'm hoping to update b2aiprep this weekend. let me know if you think a version could be released.
This is an attempt to structure audio features extraction. This would contribute to the human phenotype vector with some low-level acoustic and audio quality descriptors.
Follow-up steps include: