gospodarka-przestrzenna / taksonomia-wroclawska

1 stars 0 forks source link

Multiple entries with same data in dimension column cause errors #7

Closed mk45 closed 4 months ago

mk45 commented 1 year ago

Here few separate problems needs to be discussed and taken into account:

  1. The first problem is (as far as I understand) that in described situation problem is that 'springs' between considered points (data entries - rows) are already extremely contracted and flattening/changing space there is like division by 0 in those regions of space

    • solution to that problem might be to abstract and create abstract point where multiple points from original data will be represented as one , and after algorithm processing we may recover and assign final solution to original data as if original point are in the same point in result
    • such solution is altering sammon mapping alghoirtm
  2. Completly another problem is basic assumption (for dendryt part) that no two distances can be the same. Used algorithm must create a tree, tree that uses particular set of edges . If two eggs are same in length/cost there might be the case where we choose different set of edges and still get valid output a feasible solution of dendryt-wrocławski. Such behavior must result in one of :

    • The result is non deterministic (each time we may get differen result)
    • (already applied) The data must not contain confusing edges - same in length where we cannot distinguish choosing one is better than another: The set of points merged together in single point (1.) definitely falls into this schema, there are multiple connections between them that share common length (a 0 length) we are confused which to use .

3 We may delete data based on certain criteria (this is dangerous ) 4 maybe we can modify idea of dendryt-wrocławski alghoritm

mk45 commented 4 months ago

Resolved as we show the error message about zero distance in high dimensional space