Open fsicre opened 4 years ago
Are you asking for an example on how to use BM? Could you provide more info on the case, I am not sure I understand the 128x128 locations problem really well.
Thank you!
Are you asking for an example on how to use BM?
yes, exactly, on the use case of a Self Organizing Map of binary vectors.
Usually, SOM are used on integers or float vectors as you can see on these to pages : https://en.wikipedia.org/wiki/Self-organizing_map https://www.google.com/search?q=self+organizing+map&tbm=isch
In my case I need to do this kind of classification on sparse binary vectors.
BitMagic seems to be a very very good candidate for that.
The naïve SOM algorithm is very simple :
The map will progressively evolved and finally each vector to classify will have a best location identified and all similar vectors in the ~ 10 millions will also be associated to locations nearby on the map.
Now imaging you have ~ 10 millions of extremely sparse binary vectors (same size ~ 1Mbits) and you want to place them on a 2D map of 128 x 128 locations. Hamming Distance can be used. The map will be a 2D matrix of binary vectors of ~ 1Mbits. The algorithm on the picture below will be used in order to "hack" the locations' vectors.
As you can see, a lot of vectors will be loaded in RAM, a lot of BV comparisons will be performed, a lot of hamming distance will be calculated, a lot of bit modification will be done, so the question is
how to correctly use BitMagic for best / maximum speed ? (32/64/128Go RAM available)
https://www.semanticscholar.org/paper/An-alternative-approach-for-binary-and-categorical-Santana-Morais/f1f079640218b72822911ef3dc7394765703d62a/figure/0 https://d3i71xaburhd42.cloudfront.net/f1f079640218b72822911ef3dc7394765703d62a/3-Figure1-1.png
Yes, i understand the problem. The closest example publicly available now is: https://github.com/tlk00/BitMagic/tree/master/samples/xsample07a
It creates binary fingerprints and implements primitive (but relatively fast) clusterization using similarity measure (COUNT_AND). Not quite well documented yet, but the code should be instructive as a use case for how to compute binary distances.
I am planning to create a separate example on how to do a more efficient distance computations for Hamming, Jacquard distances using cache locking and SIMD. Current distance compute is fast but can be even faster actually to facilitate binary SOMs.
Awesome, many thanks for the hint.
xsample07a is very interesting, although it's difficult to understand exactly why you've choosen some technics. never mind.
this project is a side project for me.
once I have written something (in few weeks) would you mind my showing to you ? in order to have your advices concrete bitmagic usage.
Surest thing. I would certainly appreciate your example, please share it.
Hi,
First I want to thank you for this awesome library.
I'm experimenting on binary NLP technics and I was wondering if you could publish a bitmagic version of SOM (self organizing map) using about ~ 10 millions of extremely sparse (< 5 / 100 000) binary vectors of same size ~ 1Mbits. The map could be for instance 128 x 128 locations.
It could be a huge step froward for me to see a correct use of bitmagic on this particular subject.
Thanks for your help.