jacebrowning / datafiles

A file-based ORM for Python dataclasses.
https://datafiles.readthedocs.io
MIT License
198 stars 18 forks source link

default_factory on field called every time the datafile class changes. #322

Closed hemna closed 5 months ago

hemna commented 7 months ago

This seems like very inefficient and weird behavior. I have a datafile class with a member that has a field(default_factory=_somefunction)

pretty much every time I touch an object of my datafile class, the default_factory is then again called twice.

from datafiles import datafile
from dataclasses import dataclass, field
from typing import Optional, Type
from rich.console import Console
cs = Console()
def _df():
    cs.print("_df called")
    return "Something"

@datafile("./test.yml")
class Stats:
    memory: int = field(default_factory=_df)
    memory_peak: int = field(default=0)

current = 2
peak = 12
cs.print(f"Current Memory: {current} Peak Memory: {peak}")
s = Stats(
    memory=current,
)
cs.print("Stats created")

s.memory_peak = peak
cs.print("peak reset")
s.memory = current
cs.print("memory reset")
cs.print(s)

Here is the output. You can see _df called multiple times

└─> python df.py
Current Memory: 2 Peak Memory: 12
_df called
Stats created
_df called
_df called
peak reset
_df called
_df called
memory reset
Stats(memory=2, memory_peak=12)
jacebrowning commented 7 months ago

Thanks for raising this! If we enable logging by adding these two lines to your example:

import logging
logging.basicConfig(level=logging.INFO)

Then we get this output:

Current Memory: 2 Peak Memory: 12
INFO:datafiles.mapper:Loading 'Stats' object from 'test.yml'
_df called
Stats created
INFO:datafiles.mapper:Saving 'Stats' object to 'test.yml'
_df called
_df called
peak reset
INFO:datafiles.mapper:Saving 'Stats' object to 'test.yml'
_df called
_df called
memory reset
Stats(memory=2, memory_peak=12)

I think one of those default factory calls is expected as the object is reconstructed during the roundtrip to the filesystem, but I can look into any further optimizations here.

jacebrowning commented 5 months ago

This version should make fewer calls to the default factory: https://pypi.org/project/datafiles/2.2.3/