mindsdb / mindsdb_native

Machine Learning in one line of code
http://mindsdb.com
GNU General Public License v3.0
37 stars 28 forks source link

JSON un-nesting #420

Closed George3d6 closed 3 years ago

George3d6 commented 3 years ago

How to we handle nested JSONs as values for columns?

TL;DR Unnest them, create columns for each of the unnested keys, proceed as usual

Training interface

Use https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html on column.

Example:

0.4

Default: 0.5

Predict interface

We use dot notation for the unnested columns, so col a = { b: 43, c: { d: 'fsddsd', e:55 } } becomes a.b, a.c.d,a.c.e.

This will allow us to query the predictor from mongo client using the unnested columns since mongo has . notation. E.g. the bellow two statements are both valid mongo syntax and mean the same thing:

db.inventory.find( { "size.uom": "cm" } )
db.inventory.find( { size: {  uom: "cm" } } )

So we should be able to query the predictor using the original a column as a JSON from mongo, and then it will get unnested. But also to query using the first notation above and specify the unnested columns directly, since we use . for the name, so it's the same as mongo's . notation.

Similarly, with the HTTP+Native interfaces, we'll be able to use both the raw JSON, which will get unnested or the unnested columns.

In Scout and in the SQL API, which both require the column list, we'll only be able to use the . notation

Where do we put this code?

Probably in the DataExtractor but I'm not sure, I think that's the best place.