sillsdev / SpeechAnalyzer

SIL Speech Analyzer is a Windows program for acoustic analysis of speech sounds.
https://software.sil.org/speech-analyzer/
7 stars 3 forks source link

Feat/display formant tracks #56

Closed kgorham closed 2 years ago

kgorham commented 2 years ago

1) modified document class so that a spectrogram or snapshot can be returned independently. 2) modified helperwnd class so that the background is not redrawn when the plot window is resized. 3) modified helperwnd class so it doesnt block the message loop. 4) removed unused function declarations. 5) updated graphParameters so that the format track setting is not stored. 6) changed code to use unique_ptr<> for pointer management in spectrogram code.

Fixes #13

darcywong00 commented 2 years ago

so we should probably get a review from @darcywong00 as well. After two approving reviews, @darcywong00 can produce a build and @terrygibbs can give it a try and a thumbs up before shipping it out.

I plan to review this on Thursday, though I'm not really familiar with the SA code on handling processes. 😕 Definitely can help make the build later for Terry to try.

darcywong00 commented 2 years ago

We should add an entry in CHANGELOG.md for the next build

# SA - 3.1.1.3 12/18/2021
- Fix something about formant tracks...
terrygibbs commented 2 years ago

Hi,

Maybe name it "Revised Format Tracking procedure".

Terry

On 12/13/2021 9:02 AM, Darcy Wong wrote:

We should add an entry in CHANGELOG.md for the next build

|# SA - 3.1.1.3 12/18/2021 - Fix something about formant tracks... |

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sillsdev/SpeechAnalyzer/pull/56#issuecomment-992041540, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJU45RG6PXXWRBBHUMETULUQVHZRANCNFSM5J2DHD7A. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

megahirt commented 2 years ago

@kgorham 's video demo of the features in this PR, as referenced in https://github.com/sillsdev/SpeechAnalyzer/issues/13#issuecomment-993563999 can be seen here:

https://user-images.githubusercontent.com/10772332/146010251-07f9e7cf-568b-4d2e-bb6e-7ea109ad7418.mp4

kgorham commented 2 years ago

Problem noted by @terrygibbs:

To reproduce what I'm seeing

Install SA Place this short-test file on the desktop: "Lahu Shi 1 A.wav" Place this long-test file on the desktop: "Lahu Shi 1-3 A.wav"

Test #1 Double click on the desktop icon: "Speech Analyzer" Locate and select the short-test file "Lahu Shi 1 A.wav" Select the option: "Waveform, Spectrogram" At the top center of the screen, select the icon: "Formant Tracks"

In 6 seconds note the MS Windows RED cancel button, at the top right hand corner of SA, just became enabled indicating that SA has crashed. Continue to wait. After 20 seconds the Formant Tracks are displayed on the screen and the Status Bar info is cleared.

Test #2 Close SA, then run SA Select: "Lahu Shi 1 A.wav" Select: "Waveform, Spectrogram" Select: "Formant Tracks" In 6 seconds, after the RED cancel button is showing, then press the "Escape" key on the computer. The screen will go into a "grayed-out-mode" indicating SA had crashed. Continue to wait.

After 20 seconds the screen returns to normal, the Formant Tracks are displayed and the Status Bar info is cleared.

When showing Formant Tracks: "Lahu Shi 1 A.wav" => 5sec of data takes 20sec "Lahu Shi 1-3 A.wav" => 25sec of data takes 1min 50sec

kgorham commented 2 years ago

NOTES: 1) When the RED cancel button is displayed, SA will also display 'not responding' in the upper left corner by the filename. 2) This issue does not happen every time. 3) The fact that the application still runs and completes tells me that this isn't a really a crash. SA is consuming 100% of my CPU during formant track processing, AND we know that it is blocking a message loop. My thought is that Windows is detecting this and marking the fact. 4) I've tried running the release build under a debugger, but have not been able to reproduce the 'RED X' scenario in that environment.

darcywong00 commented 2 years ago

Before we merge this, I'm making another test build for @terrygibbs to try...

terrygibbs commented 2 years ago

I'm not seeing any changes in the results. I think I am downloading from the place where Darcy indicates to me. What I have tested so far are these versions: SA Tests

I'm not sure what to do.

kgorham commented 2 years ago

From Terry:

I've tested SA and I continue to see the same issue that SA will get hung up after a few seconds.

My machine currently takes 23 seconds to complete then the process displays the Formant Tracks.

After the past 5 versions not being different and this newest version giving the same bad results I was perplex.

It then occurred to me to try another computer :)

I just bought my wife a new Dell computer 3 weeks ago which has a 11th Gen CPU.

The good news is that on her computer SA works just fine. However on my computer it fails; as usual. The bad news is I don't know how to figure out why my computer is different :(

kgorham commented 2 years ago

From Terry: Running my computer in 'Win 7 Compatibility' mode seems to 'fix' the Formants issue. This is not needed for my Wife's computer.

Using 'High DPI Override' seems to 'fix' the Screen display issue for both my machine and hers. image

kgorham commented 2 years ago

From Terry: I'm still trying to figure out what's going on with my computer.

In the meantime it has occurred to me that there may be a simpler way to deal with the current plan in which we ask the user to use the cursors to locate a region of interest.

My concern is, I'm not sure we can display a subset of a Formant Tract series on the Spectrogram.

My idea is to still have the user position the cursors to select a "small" region of the Waveform, but then use code to open a new Window with the appropriate Waveform and Spectrogram and display the Formants there.

It would work in this way: The user has opened a file and for some reason selects 'Spectrogram' to view it. The user does not know that the 2 cursors need to be positioned before showing the "Formant Tracks". But they select the "Formants Tracks" icon to see what happens.

We can determine the space between the 2 cursors and if it's 'too close' or 'too far apart' then we say so. When the user finally selects a 'valid' region and presses the "Formant Tracks" icon, then, using code, we would open a new Window using this sequence: "Ctrl A, Ctrl C, Ctrl N", then we display the Spectrogram and then we select the "Formant Tracks" icon.

By doing this, the user can open many different copies without effecting the original data.

kgorham commented 2 years ago

From Terry: The 'red cancel' button problem still happens in Win10 mode, but not when Win8 or Win7 mode.

NOTE: Terrys PC is a i7 Core @ 1.3 Ghz with 16GB of memory running Windows 10 Home. Mine is a Win10 VM running on top of Linux. It has 8GB of memory and a single core. For me, the red 'X' is very infrequent.

kgorham commented 2 years ago

From Terry: I don't see an easy of generating Fomants unless we work with small data set.

Some background:

For an audio file that's 70 sec. long, which has 10 Words with 3 repetitions per Word, it takes 50 sec to process. With some user delay it basically takes 1 minute to process 30 Formants.

My previous Metric was the cursors need to be more that 1 second apart and not more than 1 minute apart.

Students here usually record 436 Words and each word is spoken 3 times; so that's 1,308 formants to process. Handheld recorders are typically used in record between 30 to 40 words before a break is provided. At 40 Words per session, times 10 recording events, that's the 436 Wordlist.

Each time we are faced with a 40 Word file that has 3 words each which gives us 120 Formants to process. That works out to about 4 minutes before SA can show the Formants. So unless they want to wait that long, users would need to press the Esc button. Here is another idea.

==

I think we need to engage the user at 2 levels...

If a user decides that they want to view the Formants of their current Spectrogram, and they press the "Formant Tracks" button, then SA needs to decide what to do.

First: If SA determines that the file is 1 minute or less in Time length and since it takes about 30 seconds or less to process that, then just let SA process the Formants as if nothing is wrong.

Second: But more common for us is if the file is more than 1 minute long, then a message needs to be presented to the user that indicates that they need to place the 2 cursors in a way to indicate what small part of the Spectrogram they want to process.

The message could say this: Since this is a large Spectrogram, it needs to be analyzed in smaller sections. So move the 2 cursors such that they are separated by 60 seconds or less. (Note: The timeline is always shown at the bottom of the Waveform.) Then press the "Ctrl A, Ctrl C, Ctrl N" keys to show a new Window. Next press the "Waveform, Spectrogram" option in the Side Bar. Now press the "Formant Tracks" button in the Tool Bar.

kgorham commented 2 years ago

@terrygibbs,

I've looked a little bit at the formant tracker code. there are two things that continue to bother me: 1) all the work for processing the formant tracks is done on the 'paint' call. this is blocking the message loop and causing SA to be unresponsive. if we could make this a background task, this would allow the user to do other work until the results are ready. we could provide a window somewhere to show progress, or maybe add something to the task bar. 2) the processing itself is done using a single thread. I would hope that introducing threading would allow us to improve the performance, i would hope by at least 2-4x. i don't think it would be too hard to put together a prototype that does threading to see if it truly helps. the current logic processes the data in chunks, so this should be doable. the only gotcha is that it writes everything to a file for display at a later time - so that might have to be reworked.

I understand the solutions you are proposing, and i agree that a smaller window would help processing times. I'm still not too hot on the idea of it being in a separate window though. The keystrokes you mentioned above would copy out the chunk of data the user is interested in and display it in a separate document, which to me would seem to use too much screen real-estate. couldn't we just put the results up into a popup or floating window instead - one that is not in a separate document instance?

I don't know if you saw my proposal for masking the data. my thought was this: the formant tracker receives it's input data in chunks. anything that would fall outside of the selected area (the green/red bars) would be zeroed out. we would optimize that path so that processing would be optimal for 'no' data. the real processing would still occur on the section that is selected. so this would give us the performance boost we are seeking. all the rest of the display code and remain the same as it is now. this would be done using the above user interactions you are proposing - sans the part that creates a new window.

megahirt commented 2 years ago

I don't actually know what formant tracks are or what they are used for, but given your description, could the program simply chunk the audio into 10 second segments and update the UI after each segment has finished "processing"? Basically, formant track processing is a long running process/task that if able to be run in the background, would give the best UX.

Perhaps formant track audio segments cannot be stitched together, in which case my idea probably won't work.

On Wed, Jan 12, 2022 at 8:48 AM Kent Gorham @.***> wrote:

From Terry: I don't see an easy of generating Fomants unless we work with small data set.

Some background:

For an audio file that's 70 sec. long, which has 10 Words with 3 repetitions per Word, it takes 50 sec to process. With some user delay it basically takes 1 minute to process 30 Formants.

My previous Metric was the cursors need to be more that 1 second apart and not more than 1 minute apart.

Students here usually record 436 Words and each word is spoken 3 times; so that's 1,308 formants to process. Handheld recorders are typically used in record between 30 to 40 words before a break is provided. At 40 Words per session, times 10 recording events, that's the 436 Wordlist.

Each time we are faced with a 40 Word file that has 3 words each which gives us 120 Formants to process. That works out to about 4 minutes before SA can show the Formants. So unless they want to wait that long, users would need to press the Esc button. Here is another idea.

==

I think we need to engage the user at 2 levels...

If a user decides that they want to view the Formants of their current Spectrogram, and they press the "Formant Tracks" button, then SA needs to decide what to do.

First: If SA determines that the file is 1 minute or less in Time length and since it takes about 30 seconds or less to process that, then just let SA process the Formants as if nothing is wrong.

Second: But more common for us is if the file is more than 1 minute long, then a message needs to be presented to the user that indicates that they need to place the 2 cursors in a way to indicate what small part of the Spectrogram they want to process.

The message could say this: Since this is a large Spectrogram, it needs to be analyzed in smaller sections. So move the 2 cursors such that they are separated by 60 seconds or less. (Note: The timeline is always shown at the bottom of the Waveform.) Then press the "Ctrl A, Ctrl C, Ctrl N" keys to show a new Window. Next press the "Waveform, Spectrogram" option in the Side Bar. Now press the "Formant Tracks" button in the Tool Bar.

— Reply to this email directly, view it on GitHub https://github.com/sillsdev/SpeechAnalyzer/pull/56#issuecomment-1010548276, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2I6KISPYNEE3VQJVGNKOLUVTMXNANCNFSM5J2DHD7A . You are receiving this because you commented.Message ID: @.***>

kgorham commented 2 years ago

@megahirt what is the release cycle? is it quarterly? as-needed? i feel bad that some items have been languishing in the queue, and i'd like to move them out if possible (darcys & my stuff).

kgorham commented 2 years ago

@darcywong00, @megahirt would it be OK if I changed the C++ version on a couple of the projects to C++17 or C++20? there are some threading capabilities I would like to play with.

megahirt commented 2 years ago

@megahirt what is the release cycle? is it quarterly? as-needed? i feel bad that some items have been languishing in the queue, and i'd like to move them out if possible (darcys & my stuff).

Release as needed, specifically when @darcywong00 has the time to do the release process. I love that SA is being actively developed again! I am willing to work with @darcywong00 when he's on an SA week to do a release. It's not ideal, but I am a team lead on a different project and I don't have time to regularly work on SA.

megahirt commented 2 years ago

@darcywong00, @megahirt would it be OK if I changed the C++ version on a couple of the projects to C++17 or C++20? there are some threading capabilities I would like to play with.

yes absolutely - change as you desire. If there are implications for local development in VS, let us know.

kgorham commented 2 years ago

@megahirt

could the program simply chunk the audio into 10 second segments and update the UI after each segment has finished "processing"?

yes, that is the ideal solution, but SA was not originally designed that way.
The original design was driven completing by windows message loops. It didn't have any threading at all. so that is what is causing the 'pain' now. all the processing is happening on a single windows message event. WM_PAINT.

That solution would solve the 'not responding' problem, but we also still need to deal with the problem as to how long it takes to process the data. Threading would help there as well, or maybe researching a new algorithm.

terrygibbs commented 2 years ago

Kent, my suggestions are just ideas that I hope will help with finding a good solution. I'm not holding onto any specific idea, so don't feel obligate to implement an idea I suggest.

You wrote: "Threading would help there as well, or maybe researching a new algorithm." If Threading can be done in SA that would be a good thing to implement.

However I suggest we keep the 'algorithm' and not look for a new one. A considerable amount of time and effort went into developing the Formant tracking code. We wanted SA to be able to provide PhD linguists with high quality results. We had 2 signal processing engineers work on this for several months and then an outside expert was asked to complete the details. This then resulted in the Formants Tracks having a very high precision of accuracy.

megahirt commented 2 years ago

@kgorham I want to apologize that this PR has stalled for several months. That's not how we intend to run things and your contributions are very valuable. I am going to merge this PR today as it has two approving reviews and @terrygibbs has at least confirmed that it works on a modern machine. I prefer to merge and release this code and if there are subsequent issues we can follow up with a separate PR. Thank you!