michaelaye / planet4

Analysis software for the PlanetFour citizen science project.
www.planetfour.org
ISC License
2 stars 3 forks source link

Possibly split up data with 100 classifications #11

Closed michaelaye closed 9 years ago

michaelaye commented 10 years ago

DBSCAN cannot cluster data sets well with large differences in densities, since the minPts-ε combination cannot then be chosen appropriately for all clusters. This might mean that I get better results for determine a clustering parameter set for 30 classifications, and then splitting up tiles that have more than that into 2 or 3 subsets with each having comparable densities, then produce a mean result somehow out of this after.

michaelaye commented 9 years ago

This is being dealt with in #12, so I'm closing this in favor of it. It's better to deal with it in proportion to classifications than to artificially split the data up.