Open xiaogu-space opened 5 years ago
Hello, I haven't implemented this in other languages but I'm sure there are ways to reproduce the methods. For example, Python must have some modules to do hierarchical clustering or distance computation for nearest neighbors. I've always done these kind of things at the same time as data visualization so I use R for this.
Ok,I will always follow this project.
Hi, I really like what you did! I am doing something similar for one of the tasks in my thesis, which is related to spatial clustering of equal sizes. I am doing something similar than what is being done here: https://statistical-research.com/index.php/2013/11/04/spatial-clustering-with-equal-sizes/
However, I am creating sampling areas in each city of Texas, and some cities are very big and consequently time-consuming. I tried to use your approach, but having a matrix of 100k100k is untreatable. Even more, one of the cities would produce a 1m1m matrix, because it has 1m households.
My question is the following: Have you think in other way of calculating/storing distances that consumes less memory? Thank you
NIce! Good to know it might be useful to someone else.
I don't know much about better ways to store distances. Maybe R or other languages can use disk-stored rather than memory-stored distance object, but I've never used them. That might help for your memory issue but the computation might still take too long.
A more feasible solution for large datasets could be to use either the kmean or kNN approach as they don't necessarily require to compute all the distances. The three easiest things I would try:
Thank you a lot! I find your suggestions very useful. I will try to implement some of them. I will let you know if it works successfully :)
@jmonlong Thanks a lot for the ideas. I am running a program that uses the second idea you suggest and it is working perfectly. If you are interested I can share with you the results and more about the thesis.
Hi,I'm interested in ClusterEqualSize https://github.com/jmonlong/Hippocamplus/blob/master/content/post/2018-06-09-ClusterEqualSize.Rmd but I won't use R,is there any other language like "python,java,node"?