zolekode / json-to-tables

Convert any JSON File to Normalised tables
GNU General Public License v3.0
15 stars 2 forks source link

json-to-tables

This is a simple package to help you convert any dynamic json file into SQL-like csv tables.

Installation

Just clone the project or merge the core package into your own project.

Usage

Assume you want to convert the JSON file below to tables.

    {
        "name": "truck",
        "brand": "BMW",
        "num_wheels": 4,
        "engine": {
            "brand": "RR",
            "date_of_production": {
                "day": 3,
                "month": "Feb",
                "year": 1990
            },
            "creators": ["Sandy", "Leslie", "Kane"]
        }
    },
    {
        "name": "bike",
        "num_wheels": 2,
        "top_speed": "100Km/hr",
        "engine": {
            "brand": "Audi",
            "date_of_production": {
                "day": 2,
                "month": "Sep",
                "year": 2002
            },
            "creators": ["Anabel", {"GreenMotors": {"CEO": "Charles Green"}}]
        }
    },
]

First, you load the JSON string.

# automobiles = json.dumps(automobiles) uncomment this line if you directly copy the JSON file above in your script for testing

automobiles = json.loads(automobiles)

Then run the following code:

import json
from core.extent_table import ExtentTable
from core.table_maker import TableMaker

# Creates an extent table object that manages all the tables
extent_table = ExtentTable()

# Pass the extent table object to the table maker
table_maker = TableMaker(extent_table) 

# Below is the name of the objects you are trying to convert. In our case, we are dealing with automobiles hence the "root" name will be automobiles
root_table_name = "automobiles" 

table_maker.convert_json_objects_to_tables(automobiles, root_table_name)

# num_elements is the max number of elements to show when printing the tables
table_maker.show_tables(num_elements=5)

Here are the result obtained after running the script.

SHOWING TABLES :D

automobiles
   ID   name brand num_wheels engine top_speed
0   0  truck   BMW          4      0      None
1   1   bike  None          2      1  100Km/hr
2   2   None  None       None   None      None
____________________________________________________

Note: The truck has no top_speed attribute, so its value is None. The engine attribute contains reference keys to the engine table. Note that For the the bike, the engine reference key is 1. Looking at the table engine, you notive that ID=1 has the brand=Audi. Which was the engine brand for the bike in the JSON file.

engine
   ID brand date_of_production
0   0    RR                  0
1   1  Audi                  1
2   2  None               None
____________________________________________________

Note: The same thing goes for the date_of_production of the engine. Where the ID=0, the brand=RR and the date_of_production=0. This 0 is the reference key to the table date_of_production. Again looking at the date of production table where the ID=0 you notice that the month=Feb which is the exact month for the BMW in the original JSON file.

date_of_production
   ID   day month  year
0   0     3   Feb  1990
1   1     2   Sep  2002
2   2  None  None  None
____________________________________________________

engine_?_creators
  ID PARENT_ID is_scalar  scalar
0  0         0      True   Sandy
1  1         0      True  Leslie
2  2         0      True    Kane
3  3         1      True  Anabel
4  4         1     False    None
____________________________________________________

Note: For Multivalued attributes (like lists) it is a little trickier. There are 3 attributes you need to consider.

GreenMotors
   ID            CEO
0   0  Charles Green
1   1           None
____________________________________________________

engine_?_creators_$_GreenMotors
   ID GreenMotors PARENT_ID
0   0           0         4
1   1        None      None
____________________________________________________

Note: You might now wonder how the value={"GreenMotors": {"CEO": "Charles Green"}} was handled. Well, the two tables above do the job. First, note that since the value is complex, we have to create a new table for it. The way it is done here is, the keys in the complex object are sorted and the first key is used as table name. Since there was only one key=GreenMotors, we obtain a table named engine_?_creators_$_GreenMotors. You can break this name into two parts. engine_?_creators and GreenMotors. Aanalog to what we saw before, the _$_ character here is used to tell us that GreenMotors if an object that was contained in a list of creators in an object engine.

Conclusion

I know it might be alot to digest but just take your time to read and test it out yourself. I included an example script you can run.

Tip

Try to use different names for objects that have different meanings in your JSON file. Things could get confusing if not.