Open Honghe opened 3 years ago
Hi Jack,
Thank you very much for the feedback! I have tried the panns inference
of the wav you attached and get the following result:
[image: image.png]
The panns_inference version is 0.0.7. Look the number of frames is 800
here.
For this example, Yamnet performs better than PANNs in detecting
silence. Here are two possible reasons:
Yamnet is trained on 1-second segments. While PANNs are trained on 10-second segments with weak labels to obtain better audio tagging performance.
PANNs applies mixup to improve the detection of other sound events, while mixup lower the performance for silence.
It is very useful for us to know this feedback! We are very happy to know more comparision between Yamnet and PANNs if there are any!
Best wishes,
Qiuqiang
On Thu, 28 Jan 2021 at 11:48, Jack notifications@github.com wrote:
Hi qiuqiangkong, Thanks for your great job. Recently, I tested panns_inference with the following wav audio. silence.zip https://github.com/qiuqiangkong/panns_inference/files/5884423/silence.zip The wav is a generated audio with "little noise, silence, little noise". [image: image] https://user-images.githubusercontent.com/1092722/106086684-9c799a80-615d-11eb-95ca-efdf54903873.png
The paans_inference's output is as below. It can not recognize Silence, and the probability gap of Pink noise between the wav's head and tail is a little big. [image: image] https://user-images.githubusercontent.com/1092722/106083000-b82d7280-6156-11eb-8109-62adff25b3d3.png
In contrast, the yamnet's output is more ressonable as follow. [image: image] https://user-images.githubusercontent.com/1092722/106085778-cc27a300-615b-11eb-82ce-19d1b18c7185.png
The panns_inference code I used was 013c0f6 https://github.com/qiuqiangkong/panns_inference/commit/013c0f6ab617c1be58f3b3564a6f2b17f5e1d2dc
Sincerely!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/qiuqiangkong/panns_inference/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFXTSPJZDV4PLBQVXB7MQ3S4DNB3ANCNFSM4WWLXHSQ .
@qiuqiangkong Thank you for your reply, but it seems your pic upload failed.
@Honghe Sorry! See prediction figure attached:
Hi qiuqiangkong, Thanks for your great job. Recently, I tested
panns_inference
with the following wav audio. silence.zip The wav is a generated audio with "little noise, silence, little noise".The
paans_inference
's output is as below. It can not recognizeSilence
, and the probability gap ofPink noise
between the wav's head and tail is a little big.In contrast, the
yamnet
's output is more ressonable as follow.The
panns_inference
code I used was https://github.com/qiuqiangkong/panns_inference/commit/013c0f6ab617c1be58f3b3564a6f2b17f5e1d2dcSincerely!