rymc / n2d

A deep clustering algorithm. Code to reproduce results for our paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.
GNU General Public License v3.0
129 stars 21 forks source link

Do y'all intend on converting this to a library? #3

Closed josephsdavid closed 4 years ago

josephsdavid commented 5 years ago

I love this research, and would love to see it in an even more portable/applicable fashion in the form of a library. I have started with an object oriented framework for this stuff here, https://github.com/josephsdavid/N2D-OOP, and would love to make this into an entire library where it can be widely used :)

Regardless, keep up the great work!

rymc commented 5 years ago

This sounds like a good idea! Let me know if you need anything.

josephsdavid commented 5 years ago

Great! I’ll probably bug you with questions but just happy to have permission to help!!

josephsdavid commented 5 years ago

Do you have a preference of license? I do not

rymc commented 5 years ago

I personally like (in no order) BSD, MIT AND GPL.

josephsdavid commented 5 years ago

MIT it is! Version 0.0.1 should be up on pypi :)

josephsdavid commented 5 years ago

I am currently writing the documentation for the package (or an early version of it), and I am not totally comfortable putting my name as the author, can/should I include you?

Also, I have been thinking about paths for improving the library, and have experimented a bit with density based clustering within it, but also I think it would be interesting/potentially useful to implement denoising and sparse autoencoders (this is also more along the lines of what I am good at), so that when we have simpler/smaller datasets the autoencoder can continue learning for a longer time (for example with pendigits it quickly hits a loss of like 5 e-5 and then kind of bounces around, which I believe would be fixed with a denoising autoencoder). What do you think would be most useful to work on for future researchers/your work?

rymc commented 5 years ago

With regards to authorship I think that citing the paper is enough.

In terms of making the library most useful..

I agree that other types of autoencoders would be interesting. In addition to what you mentioned I think convolutional and recurrent would be useful.

Adding data augmentation should also improve performance.

josephsdavid commented 5 years ago

Awesome, I can certainly do that! I remember the paper mentions data augmentation as well, good idea!

Thanks again!