pydantic / bump-pydantic

Convert Pydantic from V1 to V2 â™»
MIT License
302 stars 24 forks source link

Non-deterministic (and sometimes incorrect) finding of BaseModel classes #117

Open slafs opened 1 year ago

slafs commented 1 year ago

When trying to run BP001 (add default None) on our project, I've noticed something interesting. The part when the tool is "Looking for Pydantic Models" with ClassDefVisitor yields different results when running more than once. I think I've managed to boil it down to a minimal example.

Setup

Lets consider 4 simple files: a.py, common.py, utils.py and z.py like so: ==> common.py <==

from pydantic import BaseModel

class NewBaseModel(BaseModel):
    foo: str | None

==> utils.py <==

from common import NewBaseModel

class CustomBaseModel(NewBaseModel):
    bar: str | None

==> a.py <==

from utils import CustomBaseModel

class AModel(CustomBaseModel):
    a: str | None

==> z.py <==

from utils import CustomBaseModel

class ZModel(CustomBaseModel):
    z: str | None

(apologies for terrible and potentially unintuitive naming 😅)

Requirements

There are 2 important things to note here:

Expected outcome

Running bump-pydantic . on such "project" should add None as a default value for fields in all 4 modules.

Observed outcome

But it turns out it does that in (more or less) half of the runs. I've run the tool 100 times and only 47 of them ended up in modifying all four files, 15 times it modified 3 of them (common.py, utils.py and z.py) and 38 times only 2 of them (common.py and utils.py).

Problem research

I see two issues here:

  1. non-deterministic behaviour and
  2. incorrect finding of BaseModel classes.

Looks like the problem occurs because of the way the queue is being constructed and the way files are being added to it later in https://github.com/pydantic/bump-pydantic/blob/393dc89dd85f0c9171234e6b8ca5fa841b3292b8/bump_pydantic/main.py#L116-L118 since set by nature is unsorted, files will be added to the queue in a "random" order. Although a.py will be processed first every time. Fixing that non-determinism should be relatively simple. For example, we could make the queue initially the same as files, appendleft files coming from visitor.next_file and get rid of the missing_files bit.

Now, once we fix that, AModel will never get marked as a pydantic model, at least with the way I've fixed the first issue. Which I haven't figured out why, yet. I mean, it does get marked correctly ~50% of the times currently, so... 🤷. I haven't fully grokked the ClassDefVisitor code yet. Seems like there is some consideration for those cases there already.

For solutions I was considering either going through all of the files twice or allowing callers to specify fqns of custom base models.

I can open a PR to better demonstrate what I mean here. This is what I mean about that non-determinism fix: 13bfe61c4d5df9e31559d7e6a11d9b2e66e4657d (from https://github.com/pydantic/bump-pydantic/pull/118).

slafs commented 1 year ago

I haven't fully grokked the ClassDefVisitor code yet. Seems like there is some consideration for those cases there already.

OK. I think I figured it out. It looks like the code responsible for disambiguation in https://github.com/pydantic/bump-pydantic/blob/dda0b55ee0671b6941a2bab900c6a8d63b26886f/bump_pydantic/codemods/class_def_visitor.py#L61-L85 only works one level "up". So it manages to resolve something like:

# -- bar.py
from package.foo import Foo

class Bar(Foo):
    b: str

# -- foo.py
from pydantic import BaseModel

class Foo(BaseModel):
    a: str

but once you add one (or more!) levels of inheritance e.g.:

# -- baz.py
from package.bar import Bar

class Baz(Bar):
    c: str

then at the end CLS_CONTEXT_KEY entry will have one lingering entry with: {'package.bar.Bar': {'package.baz.Baz'}}.

It seems like some recursive disambiguation is needed at the end of the whole process (or at the end each visit).

Kludex commented 1 year ago

Closed by https://github.com/pydantic/bump-pydantic/pull/118 ?

slafs commented 1 year ago

Closed by #118 ?

Nope. #118 is only for 1. from:

I see two issues here:

  1. non-deterministic behaviour and
  2. incorrect finding of BaseModel classes.

Will try to propose something for number 2. later.

It might get tricky with testing as I see there's no unit tests for ClassDefVisitor :/.