pylint-dev / pylint

It's not just a linter that annoys you!
https://pylint.readthedocs.io/en/latest/
GNU General Public License v2.0
5.31k stars 1.13k forks source link

Add support for PEP 681: ``dataclass_transform`` #7437

Open mjhaye opened 2 years ago

mjhaye commented 2 years ago

Bug description

When building my own dataclass decorator based on the stdlib dataclass, the type of the fields initialized via a field call is inferred incorrectly. This triggers errors every time the field is accessed (and thus this cannot be silenced locally).

# pylint: disable=missing-docstring,unnecessary-lambda,too-few-public-methods
from dataclasses import dataclass, field

def mydataclass(cls):
    cls.myfun = lambda self: print(self)
    return dataclass(cls)

@mydataclass
class TestDC:
    a_dict: dict[int, int] = field(default_factory=dict)

dc = TestDC()
dc.a_dict[0] = 0

I did some research in the issues here and at astroid and it appears to be related to the difficulty to determine if an object, TestDC in this case, is a dataclass. It seems to me, as a laymen, that in brains_dataclasses.py it is more or less hard-coded if something is a dataclass (stdlib, pydantic, and marshmallow_dataclass are supported).

A suggestion: Stdlib provides the is_dataclass() function that checks for a __dataclass_fields__ attribute. Maybe this could be a "generic" way of checking if a class is a dataclass. It would work for stdlib, pydantic and marshmallow_dataclass, but also on all homebrew dataclass adoptations that are derived from that.

Configuration

No response

Command used

pylint dataclass_pylint.py

Pylint output

************* Module dataclass_pylint
dataclass_pylint.py:13:0: E1137: 'dc.a_dict' does not support item assignment (unsupported-assignment-operation)

Expected behavior

No error.

Pylint version

pylint 2.15.2
astroid 2.12.9
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]

OS / Environment

W10

Additional dependencies

No response

DanielNoord commented 2 years ago

The issue with using __dataclass_fields__ is that we don't have access to it when we determine what is and what isn't a dataclass.

This recognition happens when we look at the definition of a class and check its decorators. However, there is no way to determine that mydataclass sets the __dataclass_fields__ attribute.

This problem is solved by https://docs.python.org/3.11/whatsnew/3.11.html#pep-681-data-class-transforms in Python 3.11. We should probably open an issue on the astroid checker about support for such transforms.

Pierre-Sassoulas commented 2 years ago

We should probably open an issue on the astroid checker about support for such transforms.

Should we move this issue to astroid ?

mjhaye commented 2 years ago

Ah, I see. Good that 3.11 has a robust solution for this.

Maybe another solution that can work now already: I guess you can detect that a class variable is initialised via the field function of the stdlib dataclass module. Although this clearly does not catch all dataclasses, it does catch the ones that suffer from above false positives and thus could resolve this issue.

Is this a viable suggestion?

DanielNoord commented 2 years ago

Should we move this issue to astroid ?

We might need to open one there.

Ah, I see. Good that 3.11 has a robust solution for this.

Maybe another solution that can work now already: I guess you can detect that a class variable is initialised via the field function of the stdlib dataclass module. Although this clearly does not catch all dataclasses, it does catch the ones that suffer from above false positives and thus could resolve this issue.

Is this a viable suggestion?

I'm not sure. That seems like a workaround for something that doesn't happen too often. It might incur too much performance loss for the cases where it does happen. In 3.11 this is fixed anyway so I think it might be better to focus on supporting that syntax and then potentially think about such heuristics for lower versions.

mjhaye commented 2 years ago

@DanielNoord I tried to implement a work-around locally via a pylint plugin. Within the plugin I modified the contents of DATACLASS_MODULES in brain_dataclasses.py to include my own decorator. However, this seems to have no effect. Do you have any suggestions if there is a fix I can do locally (without modifying astroid / pylint themselves)?

DanielNoord commented 2 years ago

Could you show your changes? Something like that should work..

mjhaye commented 2 years ago

The plugin code looks like this:

from typing import TYPE_CHECKING
import astroid.brain.brain_dataclasses as brain_dataclasses

brain_dataclasses.DATACLASS_MODULES = brain_dataclasses.DATACLASS_MODULES | {"mymodule"}

if TYPE_CHECKING:
    from pylint.lint import PyLinter

def register(linter: "PyLinter") -> None:
    pass

Some debugging shows that the transforms in brain_dataclasses are a being applied before the plugin is imported and thus the patch is applied too late.

Currently my work-around (just in the code, not a plugin) is a bit too hacky for my liking:

mod = ModuleType("marshmallow_dataclass")
setattr(mod, "dataclass", mydataclass)
sys.modules["marshmallow_dataclass"] = mod
from marshmallow_dataclass import dataclass 
Pierre-Sassoulas commented 2 years ago

Reopening so the latest discussion is not lost.

Pierre-Sassoulas commented 1 year ago

Here's an useful example from @apxdono in #8833:

Bug description

Custom dataclass-like decorators (marked with dataclass_transform) exhibit issues similar to ones described in #4899, https://github.com/pylint-dev/pylint/issues/7884 etc.

Whenever a dataclass "container" property is declared as field(default_factory=list|dict|etc...), pylint argues that particular methods and operations aren't applicable/available for this property.

Below example will produce 4 distinct pylint errors. The only way to resolve them is to remove = field(...) declarations.

# issue.py

import dataclasses
from dataclasses import field
from typing import Dict, List, Optional, TypeVar, dataclass_transform

_C = TypeVar("_C", bound=type)

@dataclass_transform(field_specifiers=(dataclasses.field,), kw_only_default=True)
def real_class(cls: _C) -> _C:
    data_cls = dataclasses.dataclass(cls, kw_only=True, slots=True)  # type: ignore
    return data_cls

@real_class
class Vehicle:
    model: str
    wheel_count: int = field(default=4)

@real_class
class Parking:
    parked_vehicles: List[Vehicle] = field(default_factory=list)
    spots: Dict[int, Optional[Vehicle]] = field(default_factory=dict)

    def park_vehicle(self, vehicle: Vehicle, spot: int):
        self.parked_vehicles.append(vehicle)
        self.spots[spot] = vehicle

def main():
    parking = Parking()
    parking.park_vehicle(Vehicle(model="lamborghini"), 5)
    parking.park_vehicle(Vehicle(model="prius"), 3)
    parking.park_vehicle(Vehicle(model="volkswagen"), 22)

    for spot, veh in parking.spots.items():
        if veh and veh in parking.parked_vehicles:
            print(f"Spot #{spot}: {veh.model}")

if __name__ == "__main__":
    main()

Command used

$ pylint issue.py

Pylint output

************* Module attrcheck.issue
issue.py:26:8: E1101: Instance of 'Field' has no 'append' member (no-member)
issue.py:27:8: E1137: 'self.spots' does not support item assignment (unsupported-assignment-operation)
issue.py:36:21: E1101: Instance of 'Field' has no 'items' member (no-member)
issue.py:37:26: E1135: Value 'parking.parked_vehicles' doesn't support membership test (unsupported-membership-test)

------------------------------------------------------------------
Your code has been rated at 2.31/10 (previous run: 2.31/10, +0.00)

Expected behavior

$ pylint issue.py
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
mjhaye commented 1 year ago

Has this now become the formal ticket for request for PEP681 support (I could not find another one)? If yes, maybe it is better to change the name, or open a new one.