Open bmcfee opened 7 years ago
Some additional improvements I'd like to include on the labeling front:
Sounds good. If you create a new PR, could you do it in the 0.1.3-dev branch? Thanks!
I never got around to a PR for this, but I did implement the label ordering fix and eigenvector smoothing in the new repo: https://github.com/bmcfee/lsd_viz .
Ok, this should be ready for v0.1.6 (04ff0dc078867a3bd19803861068b6ad730751ec). Thanks!
PS: your visualization tool is awesome (of course, your algorithm too!).
Hi @urinieto and @bmcfee, I was comparing codes between my own implementation of scluster and the one in the repo. And I find out that the current version in the repo does not iterate over "scluster_k" and using entropy to find optimal number of clusters. Is this correct? At least that is not corresponding to the paper.
mmmh might be a bug? I assume you refer to this part of the code in MSAF: https://github.com/urinieto/msaf/blob/master/msaf/algorithms/scluster/main2.py#L58
Right now scluster_k
is the fixed number of unique segment types (i.e., the number of clusters). I agree that it'd be nicer to learn this parameter. Would you care to create a PR with the fix? I will mark this as an enhancement.
It would be this part here in the original implementation, laplacian_segmentation, where an entropy measurement is used to estimate the best "k". I can put together a merged version between the updated and original implementations.
sounds great, thanks!
On Feb 17, 2018 4:33 PM, "Cheng-i Wang" notifications@github.com wrote:
It would be this part here in the original implementation, https://github.com/bmcfee/laplacian_segmentation/blob/ 94a2c347d841554cf4d1a3b0f39b666f23ecf576/code/segmenter.py#L492-L514 http://url, where the an entropy measurement is used to estimate the best "k". I can put together a merged version between the updated and original implementations.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/urinieto/msaf/issues/53#issuecomment-366482596, or mute the thread https://github.com/notifications/unsubscribe-auth/ADhisTIICFfzK0vHZhwr6jSIpspuBa7Iks5tV2-9gaJpZM4KjDdL .
Some experiments on the SPAM dataset have turned up a few quirks in the Laplacian segmentation method that shouldn't be too hard to fix. Problematic tracks include:
One persistent issue that I'm seeing is a tendency on long tracks to concentrate all of the clustering on a small region in the middle of the track, while leaving the vast majority of the remainder labeled as one component. This is probably an error in bandwidth estimation, but it deserves some careful diagnostics.
Another issue, potentially unrelated, is due to some high-frequency noise in the laplacian eigenvectors. I've seen this in a few random tracks, and it's easily fixed by applying a temporal median filter to the eigenvectors before clustering. This is already implemented in the librosa gallery version but I haven't pushed it upstream anywhere yet.
I'll probably send a PR for this some time in the near future, so if you want to collect more suggestions in this thread, I'd appreciate it.