lidatong / dataclasses-json

Easily serialize Data Classes to and from JSON
MIT License
1.34k stars 150 forks source link

improvements of subclasses and jsonization time by add two options in global_config #521

Open democrazyx opened 4 months ago

democrazyx commented 4 months ago

The two contributions of the pr are as follows:

  1. add class info to restore subclasses from json, can be enabled by set global_config.include_class_info = True
  2. save time by cache the result of type checking, can be enabled by set global_config.enable_cache = True

to see detailed usage and comparation, you can open the jupyter notebook file

the following code is derived from the ipynb file

# %% [markdown]
# # 1. include class info in the json result

# %%
from dataclasses import dataclass,field
from typing import Set, Optional

from dataclasses_json import dataclass_json,global_config

@dataclass_json
@dataclass
class Animal:
    id: int = 0
    health: int = 100

@dataclass_json
@dataclass
class Cat(Animal):
    age: int = 1

@dataclass_json
@dataclass
class Dog(Animal):
    age: int = 1

@dataclass_json
@dataclass
class PetCat(Cat):
    name: str = ''

@dataclass_json
@dataclass
class Person:
    name:str = 'zyx'
    animals: list[Animal] = field(default_factory=lambda:[])

# %%
p1=Person(animals=[Animal(),Cat(),PetCat()])
p1.to_dict()

# %%
p2 = Person.from_dict(p1.to_dict())
p2.to_dict()

# %% [markdown]
# some fields are missing!
# 
# to solve this, we need to include class info into the result

# %%
global_config.include_class_info=True
p1.to_dict()

# %%
p2 = Person.from_dict(p1.to_dict())
global_config.include_class_info=False
p2.to_dict()

# %% [markdown]
# now the fields are all restored!

# %% [markdown]
# # 2. use cache to save time

# %% [markdown]
# if i have thousands of objects to jsonize, the code will waste much time on get dataclass info, which will not change however in the process of jsonization 

# %%
import cProfile
import pstats
global_config.enable_cache=False
p3 = Person(animals=[Animal() for _ in range(100000)])

pr = cProfile.Profile()
pr.enable()
result_without_cache = p3.to_json()
pr.disable()
pr.dump_stats('profile_stats1')
stats = pstats.Stats('profile_stats1')
stats.sort_stats('cumulative')
stats.print_stats()

# %%
import cProfile
import pstats
global_config.enable_cache=True
p3 = Person(animals=[Animal() for _ in range(100000)])

pr = cProfile.Profile()
pr.enable()
result_with_cache = p3.to_json()
pr.disable()
pr.dump_stats('profile_stats2')
stats = pstats.Stats('profile_stats2')
stats.sort_stats('cumulative')
stats.print_stats()

# %% [markdown]
# The improvement in program speed is huge, from 6.6s to 2.5s in my laptop
# 
# now let's check if the results are the same

# %%
result_with_cache==result_without_cache