peterwittek / somoclu

Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters
https://peterwittek.github.io/somoclu/
MIT License
267 stars 70 forks source link

Is there a way to get the quantization and topographical error? #99

Open fciannella opened 6 years ago

fciannella commented 6 years ago

How can we verify that the training process was good enough after it has completed? Is there a way to get the topographical and quantization error per epoch, to see how well the training is progressing?

peterwittek commented 6 years ago

Which interface are you using? Can you define topographical and quantization error?

fciannella commented 6 years ago

I am using the python interface.

Here are the definitions from the Matlab toolbox:

qe : Average distance between each data vector and its BMU. Measures map resolution. te : Topographic error, the proportion of all data vectors for which first and second BMUs are not adjacent units. Measures topology preservation.

I was running sompy.py and I was having large quantization error in my setup (around 15), so I wanted to check whether the performance is the same on somoclu.

BTW great job, the interface is well documented and easy to use!

On Tue, Nov 28, 2017 at 6:00 PM, Peter Wittek notifications@github.com wrote:

Which interface are you using? Can you define topographical and quantization error?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/peterwittek/somoclu/issues/99#issuecomment-347693320, or mute the thread https://github.com/notifications/unsubscribe-auth/AJcF6c89C1xv9xDE0vcevhReJ-fAo1Zgks5s7JCdgaJpZM4Qt5Kq .

peterwittek commented 6 years ago

Related questions keep popping up, but there is no function implemented to calculate these two quantities. If you want to give it a shot, I would be very happy to merge them.

It is a notch harder to monitor it across epochs. All calculations are pushed to the low-level C++/CUDA code, and the Python interface only gets the results back once all requested epochs are finished. One way to get around it is to request an epoch and manually adjust the learning rate and the radius.

fciannella commented 6 years ago

Thanks, I will see if I have time, it depends on how much I use it and need the functionality. For now I am just using it to do visualization for a research I am doing, but I might use it more in the future.

But given that we don’t have those indices, how do you measure the performance of the training phase?

From: Peter Wittek [mailto:notifications@github.com] Sent: Tuesday, November 28, 2017 8:43 PM To: peterwittek/somoclu somoclu@noreply.github.com Cc: Francesco Ciannella francesco.ciannella@gmail.com; Author author@noreply.github.com Subject: Re: [peterwittek/somoclu] Is there a way to get the quantization and topographical error? (#99)

Related questions keep popping up, but there is no function implemented to calculate these two quantities. If you want to give it a shot, I would be very happy to merge them.

It is a notch harder to monitor it across epochs. All calculations are pushed to the low-level C++/CUDA code, and the Python interface only gets the results back once all requested epochs are finished. One way to get around it is to request an epoch and manually adjust the learning rate and the radius.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/peterwittek/somoclu/issues/99#issuecomment-347724596 , or mute the thread https://github.com/notifications/unsubscribe-auth/AJcF6U82A8JfDQDYC9P1iu8Rvg2kLx87ks5s7LaegaJpZM4Qt5Kq . https://github.com/notifications/beacon/AJcF6fdKJBo1CqodNraBuaDxusWc7qBbks5s7LaegaJpZM4Qt5Kq.gif

peterwittek commented 6 years ago

I also use it mainly for visualization, so the scientific way of evaluating the result has always been eyeballing it. Occasionally, I was interested in clustering, and then I measured cluster consistency.

fciannella commented 6 years ago

I can eventually add a measure for the errors if not per epoch, at least at the end of the training, so we have some kind of measure of the performance of the training.

Thanks a lot for your replies and help.

On Wed, Nov 29, 2017 at 12:54 AM, Peter Wittek notifications@github.com wrote:

I also use it mainly for visualization, so the scientific way of evaluating the result has always been eyeballing it. Occasionally, I was interested in clustering, and then I measured cluster consistency.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/peterwittek/somoclu/issues/99#issuecomment-347760689, or mute the thread https://github.com/notifications/unsubscribe-auth/AJcF6XnCphUEQ5K9jqnbsp4Iy7H-LFAkks5s7PGBgaJpZM4Qt5Kq .

peterwittek commented 6 years ago

That would be great, thanks.

akol67 commented 2 years ago

I also use it mainly for visualization, so the scientific way of evaluating the result has always been eyeballing it. Occasionally, I was interested in clustering, and then I measured cluster consistency.

Which metric are you using? WB index? Calinski-Harbusz ?....another one?