qiuqiangkong / panns_inference

MIT License
197 stars 31 forks source link

Can not recognize Silence, less stable than yamnet #6

Open Honghe opened 3 years ago

Honghe commented 3 years ago

Hi qiuqiangkong, Thanks for your great job. Recently, I tested panns_inference with the following wav audio. silence.zip The wav is a generated audio with "little noise, silence, little noise". image

The paans_inference's output is as below. It can not recognize Silence, and the probability gap of Pink noise between the wav's head and tail is a little big. image

In contrast, the yamnet's output is more ressonable as follow. image

The panns_inference code I used was https://github.com/qiuqiangkong/panns_inference/commit/013c0f6ab617c1be58f3b3564a6f2b17f5e1d2dc

Sincerely!

qiuqiangkong commented 3 years ago

Hi Jack,

Thank you very much for the feedback! I have tried the panns inference

of the wav you attached and get the following result:

[image: image.png]

The panns_inference version is 0.0.7. Look the number of frames is 800

here.

For this example, Yamnet performs better than PANNs in detecting

silence. Here are two possible reasons:

  1. Yamnet is trained on 1-second segments. While PANNs are trained on 10-second segments with weak labels to obtain better audio tagging performance.

  2. PANNs applies mixup to improve the detection of other sound events, while mixup lower the performance for silence.

    It is very useful for us to know this feedback! We are very happy to know more comparision between Yamnet and PANNs if there are any!

Best wishes,

Qiuqiang

On Thu, 28 Jan 2021 at 11:48, Jack notifications@github.com wrote:

Hi qiuqiangkong, Thanks for your great job. Recently, I tested panns_inference with the following wav audio. silence.zip https://github.com/qiuqiangkong/panns_inference/files/5884423/silence.zip The wav is a generated audio with "little noise, silence, little noise". [image: image] https://user-images.githubusercontent.com/1092722/106086684-9c799a80-615d-11eb-95ca-efdf54903873.png

The paans_inference's output is as below. It can not recognize Silence, and the probability gap of Pink noise between the wav's head and tail is a little big. [image: image] https://user-images.githubusercontent.com/1092722/106083000-b82d7280-6156-11eb-8109-62adff25b3d3.png

In contrast, the yamnet's output is more ressonable as follow. [image: image] https://user-images.githubusercontent.com/1092722/106085778-cc27a300-615b-11eb-82ce-19d1b18c7185.png

The panns_inference code I used was 013c0f6 https://github.com/qiuqiangkong/panns_inference/commit/013c0f6ab617c1be58f3b3564a6f2b17f5e1d2dc

Sincerely!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/qiuqiangkong/panns_inference/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFXTSPJZDV4PLBQVXB7MQ3S4DNB3ANCNFSM4WWLXHSQ .

Honghe commented 3 years ago

@qiuqiangkong Thank you for your reply, but it seems your pic upload failed.

qiuqiangkong commented 3 years ago

@Honghe Sorry! See prediction figure attached: image