probcomp / cgpm

Library of composable generative population models which serve as the modeling and inference backend of BayesDB.
Apache License 2.0
25 stars 11 forks source link

Potentially incorrect sorting of data into clusters in render_utils.viz_view_raw #166

Closed fsaad closed 7 years ago

fsaad commented 8 years ago

https://github.com/probcomp/cgpm/blob/20161006-leocasarsa-fixing-logpdfmultirow-crosscat/src/utils/render_utils.py#L94-L95

    # Sort data into clusters
94      assignments = np.array(view.Zr().values())
95      row_indexes = np.argsort(assignments, kind='mergesort')

It is worth checking whether view.Zr().keys() is sorted ascending, since its an OrderedDict which only maintains insertion order not key order. If the keys are not sorted ascending, then numpy argsort is returning indices which do not match their original assignments -- the third item in view.Zr()[3] is not guaranteed to correspond to view.Zr().values()[3], for example.

One fix is something along the lines of row_indexes = sorted(view.Zr(), key=view.Zr).

leocasarsa commented 7 years ago

Indeed, thanks for catching it.

fsaad commented 7 years ago

The ordering should be fixed:

https://github.com/probcomp/cgpm/blob/05be712c61576c733a6d02974299b9c589e57c58/src/utils/render.py#L140-L143