sensein / b2aiprep

Apache License 2.0
5 stars 6 forks source link

[Discuss] QC options for b2aiprep #19

Closed satra closed 5 months ago

satra commented 5 months ago

The description below comes from a request from me to identify quality control steps and from a discussion Duncan and Shaeen had. We will use this issue to refine the requirements.

The most obvious thing that came out of our discussion is that there's not a set of cookie-cutter metrics and cut-off values ready to go. We'd need to have time to look at data sets and possibly work with whoever is working on the Python code to develop a QC procedure.

The code would have to detect if a given sample passed a certain threshold based on QC metrics, at which stage there are 4 possible courses of action:

While the first and last of these are straightforward, knowing the impact of different signal modifications is challenging, and knowing which parts of a signal are acceptable to trim out for a given user-task will also be nuanced. We obviously have to be very conscious about accidentally affecting the outcomes of analysis by making changes to audio signals based on environments, knowing certain groups are more likely to be in grouped environments.

In terms of the metrics for assessing quality of a sample, we split the types of issue into 4 categories. Here are some early ideas for metrics we could use to assess sample quality. We've not put threshold numbers on those yet as they would need to be assessed by looking at data, so we've got holding numbers of "X" and "Y" for now.

  1. Signal to Noise Ratio. Compare the RMS of a background noise segment to the RMS of a signal segment.

    • Minimum Signal to Noise Ratio of "X" required to pass
    • Minimum signal amplitude: RMS value of signal must be "X"% of the dynamic range (half the bit depth).
    • Hum - An FFT of the background noise sample to assess if a strongly periodic signal with spectral peak greater than "X" exists.
  2. Clipping. Occurrence of max or minimum amplitude values must be less than "X"% of the sample, or must be less than "X" per "Y" milliseconds. Number of consecutive max or minimum values must be less than "X"

  3. Bad use of recording equipment by user. This is hard but would probably be based on spectral analysis, an unusually high spectral tilt towards the lower frequencies for example.

  4. Bad execution of the task by the user. No clear ideas on QC-ing this at this stage.