Closed mpogue2 closed 5 years ago
I assume it is not necessary to adjust each song individually. I use BOOM 2 on my mac and another sound enhancement on my win 10 tablet. They do not just provide an EQ but also other sound effects to enrich the sound. If that could be incorporated into the program that would be fine!I If not, then I think you should leave this to 3rd party and avoid the danger of overloading the program.
There's some discussion on Facebook about callers using multi band eq to drop the vocal presence bands down by a few dB (aka "make a smile" in the graphical EQ GUI), to allow the caller's voice to come through a bit more clearly. This is a use case for the graphical EQ that I didn't know a bunch of people were using. With an external mixer, there's no need to do this in the app. But, it's a nice-to-have, if time permits.
We're using libbass_fx for EQ, which has the ability to do 9 or 10 eq bands as easily as the 3 we have now. BOOM 2 is definitely cool, too, although it's US$17 at the moment (and isn't available for Windows or Linux).
Discussion on Facebook continues. Example given for the "Mike Sikorsky smile" is a 4dB drop at 2KHz, with a bandwidth of 1 octave. Furthermore, Mike Sikorsky recommends tweaking using a 30-band graphic equalizer. I think we might be able to get away with either a) the exiting 3-band graphical EQ, but add a checkbox to turn on/off a 4dB drop at 2KHz, or b) replace with a N-band parametric equalizer (which libbass supports). The parametric eq is much more flexible than a graphical eq, because the center frequencies of each band can be changed. Also, the 3-band EQ settings can easily be translated to the N-band parametric EQ.
For reference, the current 3-band graphical equalizer is: 125Hz, 1000Hz, 8000Hz; all are BW=2.5 octaves; ranges are ±15dB.
From Mike S: "Somewhere between the exact midpoint and the far right should appear a smile which lessens the intensity of frequencies that are interfering with your clarity of voice. This is usually a lead guitar or piano playing or might be harmony voices or a lead singer. Each piece of music is different so play with them and listen to the track after applying and if you don't like it simply click the undo and make adjustments."
I think this argues more strongly for a parametric EQ, since it has a tunable center frequency. For reference, the midpoint in a 30-band graphical eq is usually around 800Hz.
OK, I take that back. Mike S: "I usually start where he has it decreased on the left but I usually extended further to the right". See attached picture for the EQ curve that Glenn uses. It's more like -6dB at 2KHz with a bandwidth of about 1 octave. Mike appears to prefer it more like 2.5KHz, with a bandwidth of 1.25-1.5 octaves.
A first attempt at a UX for ParametricEQ is committed to a branch of the same name (e0edef27da88ba311f8351d8380f9a9cced0a808).
I still want to add +/- buttons, context menu, maybe shelf controls, and Q via either drag of the line, or via mouse wheel. It will be a while to get this in, but I think it's going to be a better UX than the three sliders we have today. And, it will be able to do the Mike Sikorsky smile easily, on a per-song basis. (And, I think I can make it compatible with the existing three-band EQ!).
Here's what it looks like right now:
Right now we're using a 3-band peaking EQ: BASS_FX_BFX_PEAKEQ. Libbass also has a fully-parametric EQ: BASS_BFX_BQF, with low and high shelving filters, as well as peaking filters.
It might also be nice to have a "default EQ" setting in Preferences, that is used as the default, if the user has not set EQ on a song yet. And, perhaps a "reset to default EQ" that clears the user settings.
I did a lot of research on intelligibility over the last couple of days. This is the best reference I found: https://www.dpamicrophones.com/mic-university/facts-about-speech-intelligibility
See the section called "Important Frequencies", here's an excerpt:
"A speech spectrum is either high-pass or low-pass filtered. Using an HP filter at 20 Hz (upper left) leaves the speech 100% understandable. (This is because the complete speech spectrum is there). An HP-filter cutting everything below 500 Hz still leaves the speech signal understandable. Even though most of the speech energy is cut out, the intelligibility is only reduced by 5%. However, applying a higher cut-off makes intelligibility drop.
The other way around, applying an LP-filter makes intelligibility drop very fast. When cutting at 1 kHz, the intelligibility is already less than 40%. It can be seen that the frequency range between 1 kHz and 4 kHz is of high importance for intelligibility."
This suggests that the center frequency for intelligibility is the intersection of the two curves, at about 1600Hz.
And, there's also a lot of research of "auditory masking", e.g. https://en.wikipedia.org/wiki/Auditory_masking
From Figure B, we can see that at higher volumes, the masking curve is asymmetric, meaning that a masking frequency masks higher frequencies more than it masks lower frequencies.
That suggests that if we want to create a Parametric EQ "dip" to "make room" for the vocals, especially when the music is loud, we should extend the notch a bit more in the positive frequency direction, if possible (just as Mike Sikorsky suggests!).
However, the latest research also suggests that the optimal center frequency should be somewhat lower than Mike suggests for maximum effectiveness, probably centered around 1.6KHz.
There's also research available on auditory masking by age and by gender and by hearing impairment. Summary: older people and hearing-impaired people tend to have somewhat wider masking curves than do younger people.
This suggests that we should make the bandwidth of a Parametric EQ dip filter have a variable width, so that it can be somewhat wider for an older crowd (which often contains people with hearing impairment as well), and somewhat narrower for a younger crowd.
Based on all the above, I'm inclined to build an "Intelligibility Boost" feature in the GlobalFX tab, that defaults to 1.6KHz, 2 octave width, -3dB, but allows for changing those parameters based on the situation.
If we ever do build in a Global Parametric EQ, the Intelligibility Boost setting can just turn off/on a parametric curve with exactly those parameters, for backward compatibility.
One more comment, on gender:
http://resource.isvr.soton.ac.uk/staff/pubs/PubPDFs/BS%20EN%2060268-16.pdf
This is a spec for a waveform to predict speech intelligibility in noise. Table A.3 shows the different octave weights for male vs female speakers. As we can see, intelligibility is slightly higher weighted toward the higher octaves, but the actual different is not that large.
So, I am inclined to stick with 1.6KHz, and a knob to move it up a bit if needed for female callers. Maybe a button for "init". I thought about having multiple buttons, or a dropdown for "male caller", "female caller", "older crowd", etc, but I think this is actually more complicated than just providing some knobs, with some textual help.
8c8cfc1d430c6478810985eb9c01c2c35ab42ef3 commits a simple UI for an "Intelligibility Boost", that puts in the "smile" (Mike Sikorsky's term for it). It defaults to 1.6KHz, as per the study I referenced above. Female callers should bump it up a few 0.1KHz. That's not much, and it probably doesn't matter all that much, which is why there's no special "gender" UX here -- the 1.6KHz will probably work well for people with any voice pitch, any gender.
c77c1d416b1ec139c3440d27224059a625abb88f adds the implementation. Seems to work as-designed when I test it, both before a song is loaded and changing parameters dynamically in the Preferences box after a song is loaded (and while it is playing!).
2f9107c15a0e7943f3cb0fd76db50bc58075c55b merged into master.
Right now we're using a 3-band EQ, which works pretty well. Optionally, it would be nice (future) to have a "compatible" 10-band EQ, where compatible means that a caller could switch back and forth, and the settings do something reasonable.