Questions about the section 3.1 (Stroke Segment Detection) of the paper

funasshi commented 3 years ago

Hello, I've read your paper "Learning to Recognize Handwriting Input with Acoustic Features".

I am currently implementing it and I have a question about the Stroke segment detection in section 3.1.

I would like to know about the denoising part of the Hanning window and the Wiener filter, as I feel I may have misunderstood some parts of my program because I cannot separate them properly. The blue wave shows the denoised wave with window_size of wiener filter(5,10,15s) and orange one is the original sound. It seems not to be denoised.

My understanding is as shown in the handwritten image below. Also, I don't think I saw any mention of the window_size of the Wiener filter in the paper, so it would be great if you could tell me about it. Thanks.

xiaopooh commented 3 years ago

Sorry for the late reply. The window size of Wiener is also 0.025s, but when I do Wiener filtering, I use the point that the first 0.25s is pure environmental noise for denoising.

funasshi commented 3 years ago

Sorry for the late reply. The window size of Wiener is also 0.025s, but when I do Wiener filtering, I use the point that the first 0.25s is pure environmental noise for denoising.

Thank you for your reply. I want to ask another question.

I don't understand stPSD much.

Does stPSD change the time-domain data which the shape is (T,) to (t,psd)? I think one more dimension should be added. You are using Conv2D, so the dimension should be (batch,channel,X,Y), but this doesn't correspond to (batch,t,psd). So I feel I do some misunderstanding and I can't catch the relation between these dimention.

I guess the channel should correspond to T because if it were X or Y, model couldn't catch the time dependencies. I want to know what correspond to X and Y.

xiaopooh / WritingRecorder

Questions about the section 3.1 (Stroke Segment Detection) of the paper #2