Closed xinleizhao closed 8 years ago
Problem solved by implementing R libraries, still using k-means http://stackoverflow.com/questions/21095643/approaches-for-spatial-geodesic-latitude-longitude-clustering-in-r-with-geodesic
Due to memory limit, the last 17871 entries are used to form clusters, and next steps are find centroids of these clusters then assign a cluster to each entry.
fixed on commit 9
trying to increase tree size of ball tree algorithm currently, when the tree size is larger, the kernel lasts longer before it dies
ran ~7hrs on this case: db = DBSCAN(algorithm='ball_tree', metric='haversine', leaf_size=500).fit(coordinates)
trying to change leaf size to 50000 and see if the kernel will still die or not if yes, will continue increase the leaf size and move this step to aws