xinleizhao / EarthquakeSequential

Sequential data mining for earthquake and possible related factors
0 stars 0 forks source link

Applying DBSCAN and great circle distance model to redo clustering #2

Closed xinleizhao closed 8 years ago

xinleizhao commented 8 years ago

trying to increase tree size of ball tree algorithm currently, when the tree size is larger, the kernel lasts longer before it dies

ran ~7hrs on this case: db = DBSCAN(algorithm='ball_tree', metric='haversine', leaf_size=500).fit(coordinates)

trying to change leaf size to 50000 and see if the kernel will still die or not if yes, will continue increase the leaf size and move this step to aws

xinleizhao commented 8 years ago

Problem solved by implementing R libraries, still using k-means http://stackoverflow.com/questions/21095643/approaches-for-spatial-geodesic-latitude-longitude-clustering-in-r-with-geodesic

Due to memory limit, the last 17871 entries are used to form clusters, and next steps are find centroids of these clusters then assign a cluster to each entry.

xinleizhao commented 8 years ago

fixed on commit 9