synthesized-io / insight

🧿 Metrics & Monitoring of Datasets
BSD 3-Clause "New" or "Revised" License
12 stars 0 forks source link

Complicated matrix metric return type. #36

Open simonhkswan opened 3 years ago

simonhkswan commented 3 years ago

The return type here looks pretty complicated. I suppose its because sometimes we are returning a dataframe of values and sometimes a dataframe of p_values as well right?

It would be nice if we could keep the return types as just a single DataFrame.

Maybe using something like this could help?


>>> import pandas as pd
>>> import numpy as np
>>> values = np.array([1, 1.5, 2])
>>> p_values = np.array([0.5, 0.1, 0.01])

>>> print(values, values.dtype)
[1.  1.5 2. ] float64

>>> print(p_values, p_values.dtype)
[0.5  0.1  0.01] float64

>>> metric = np.dtype([
...     ('value', 'f4'),
...     ('p_value', 'f4')
... ])
>>> combined = np.array([a  for a in zip(values, p_values)], dtype=metric)

>>> print(combined, combined.dtype)
[(1. , 0.5 ) (1.5, 0.1 ) (2. , 0.01)] [('value', '<f4'), ('p_value', '<f4')]

>>> print(combined['value'], 
[1.  1.5 2. ]

>>> print(combined['p_value'])
[0.5  0.1  0.01]

_Originally posted by @simonhkswan in https://github.com/synthesized-io/insight/pull/32#discussion_r709151501_