[Bug]: 您好，我想请问一下，通过 towhee 的 vggish 模型得到音频向量后，通过什么方法计算两个音频向量的相似度？dtw吗？

towhee-io / towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

https://towhee.io

Apache License 2.0

3.21k stars 247 forks source link

[Bug]: 您好，我想请问一下，通过 towhee 的 vggish 模型得到音频向量后，通过什么方法计算两个音频向量的相似度？dtw吗？ #2664

Closed wangdabee closed 1 year ago

wangdabee commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

如题

Expected Behavior

No response

Steps To Reproduce

No response

Environment

- Towhee version(e.g. v0.1.3 or 8b23a93):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:

Anything else?

No response

zc277584121 commented 1 year ago

@wangdabee 你好，一般归一化然后算inner product或l2距离即可。可以音频的使用例子可以参考https://github.com/towhee-io/examples/tree/main/audio

wangdabee commented 1 year ago

@zc277584121 您好，一般音频长度是不一致的，例如towhee出来的向量一个是30x128，一个是40x128 ，这种是不能计算L2距离的吧，那又该如何计算相似度？谢谢

zc277584121 commented 1 year ago

@wangdabee 如果是你想找两个音频里一段重复的，算是audio_fingerprint，你看看上面链接里面有方法。但如果你想做的是音频分类，算是audio_classification，这种情况下出来的向量形状是一样的的，和音频长度无关

wangdabee commented 1 year ago

@zc277584121 属于audio_fingerprint 但是我没找到计算两个音频向量距离的介绍

zc277584121 commented 1 year ago

@wangdabee https://github.com/towhee-io/examples/blob/main/audio/audio_fingerprint/audio_fingerprint_advanced.ipynb 在这里面的temporal_network.的介绍，就是用来估计算两组时序向量重复窗口的一种方法

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Close the stale issues and pull requests after 7 days of inactivity. Reopen the issue with /reopen.