openedx / open-edx-proposals

Proposals for Open edX architecture, best practices and processes
http://open-edx-proposals.readthedocs.io/
Other
44 stars 31 forks source link

Fix examples of enumerations and attrs in OEP-49 App Patterns: data.py #471

Closed bradenmacdonald closed 1 year ago

bradenmacdonald commented 1 year ago

In OEP-49, it gives this example of how to use attrs in data.py:

from enum import Enum

import attr

def ProgramStatus(Enum):
    ACTIVE = "active"
    RETIRED = "retired"

@attr.attrs(frozen=True)
class ProgramData:
    uuid: str = attr.attrib(
        validator=[attr.validators.instance_of(str)]
    )
    title: str = attr.attrib(
        validator=[attr.validators.instance_of(str)]
    )
    status: str = attr.attrib(
        validator=[
            attr.validators.instance_of(str),
            attr.validators.in_(ProgramStatus)
        ]
    )

There are two problems with this:

  1. It should not be using def but rather class to define the enum. As shown, the declaration is accepted but the enum won't work.
  2. The status field of ProgramData will not actually accept any value, because enum members are not strings and strings are not enum numbers, so the validators conflict:
>>> ProgramData(uuid="", title="", status=ProgramStatus.ACTIVE)
...
TypeError: ("'status' must be <class 'str'> (got <ProgramStatus.ACTIVE: 'active'> that is a <enum 'ProgramStatus'>).", ...)
>>> ProgramData(uuid="", title="", status=ProgramStatus.ACTIVE.value)
...
ValueError: ("'status' must be in <enum 'ProgramStatus'> (got 'active')", ...)

I believe that a better, working, and modern example would be this, which is what I've put in the PR:

from enum import Enum

from attrs import field, frozen, validators

class ProgramStatus(Enum):
    ACTIVE = "active"
    RETIRED = "retired"

@frozen
class ProgramData:
    uuid: str = field(validator=validators.instance_of(str))
    title: str = field(validator=validators.instance_of(str))
    status: ProgramStatus = field(validator=validators.in_(ProgramStatus), converter=ProgramStatus)

This works, is more concise, uses standard python enums, and also uses the more modern attrs syntax which they recommend for all new code.

Putting converter=ProgramStatus allows this new type to accept three different forms of the enum value, but coerces them all to a standard form for consistency:

>>> ProgramData(uuid="", title="", status=ProgramStatus.ACTIVE)
ProgramData(uuid='', title='', status=<ProgramStatus.ACTIVE: 'active'>)
>>> ProgramData(uuid="", title="", status=ProgramStatus.ACTIVE.value)
ProgramData(uuid='', title='', status=<ProgramStatus.ACTIVE: 'active'>)
>>> ProgramData(uuid="", title="", status="active")
ProgramData(uuid='', title='', status=<ProgramStatus.ACTIVE: 'active'>)

>>> ProgramData(uuid="", title="", status="invalid")
ValueError: 'invalid' is not a valid ProgramStatus

Alternative 1

Alternately, it could be suggested to not use converter=ProgramStatus, though that can be potentially confusing as it's very strict about what it accepts:

>>> ProgramData(uuid="", title="", status=ProgramStatus.ACTIVE)
ProgramData(uuid='', title='', status=<ProgramStatus.ACTIVE: 'active'>)
>>> ProgramData(uuid="", title="", status=ProgramStatus.ACTIVE.value)
...
ValueError: ("'status' must be in <enum 'ProgramStatus'> (got 'active')", ...)
>>> ProgramData(uuid="", title="", status="active")
...
ValueError: ("'status' must be in <enum 'ProgramStatus'> (got 'active')", ...)

Alternative 2

It could be nice to use Django's models.TextChoices etc. instead of a stdlib python enum, because they support internationalized names and can be used directly with Django model choices. However, this goes against the recommendation in OEP-49 which says "This file should not import anything other than stdlib modules". (Though it clearly recommends attrs for the same file which is not a stdlib module.)

from attrs import field, frozen, validators
from django.db import models
from django.utils.translation import gettext_lazy as _

class ProgramStatus(models.TextChoices):
    ACTIVE = "active", _("Active")
    RETIRED = "retired", _("Retired")

@frozen
class ProgramData:
    uuid: str = field(validator=validators.instance_of(str))
    title: str = field(validator=validators.instance_of(str))
    status: ProgramStatus = field(validator=validators.in_(ProgramStatus), converter=ProgramStatus)

When using the Django enums, converter=ProgramStatus is necessary as there is otherwise some auto-conversion going on which makes it accept an enum member value or plain string value, resulting in equivalent but inconsistently typed values being stored in the data structure. So that potential footgun could be a reason to prefer regular enums in situations where localization is not useful.

Alternative 3

A nice option could be python stdlib's StrEnum but it's only in Python 3.11, so we can't use it yet. It avoids most of the data type complexities that I've found (see below) though it doesn't support internationalizing the labels like Django enums do.

openedx-webhooks commented 1 year ago

Thanks for the pull request, @bradenmacdonald! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

bradenmacdonald commented 1 year ago

Actually I have found some problems using the standard python enums.

If you define a Django field like this:

class Program(models.Model):
    status = models.CharField(max_length=20, choices=[(s.value, s.value) for s in ProgramStatus])

and then use DRF to serialize it with:

class ProgramSerializer(serializers.ModelSerializer):
    class Meta:
        model = Program
        fields = [..., 'status', ...]

you get a TypeError: Object of type ProgramStatus is not JSON serializable

In addition, it becomes very tricky to write your ORM queries because it's easy to make them subtly wrong and you won't see any warning:

# Using standard python enum values in the model choices
In [4]: models.StagedContent.objects.all()[0].status
Out[4]: 'expired'
In [5]: models.StagedContent.objects.all()[0].status == data.StagedContentStatus.EXPIRED
Out[5]: False
In [6]: models.StagedContent.objects.filter(status=data.StagedContentStatus.EXPIRED).count()
Out[6]: 0
In [7]: models.StagedContent.objects.filter(status=data.StagedContentStatus.EXPIRED.value).count()
Out[7]: 7

Using from django.db.models import TextChoices avoids these problems: it works with DRF ModelSerializer automatically, and it has no footguns for the ORM:

# Using TextChoices
In [2]: models.StagedContent.objects.all()[0].status
Out[2]: 'expired'
In [3]: models.StagedContent.objects.all()[0].status == data.StagedContentStatus.EXPIRED
Out[3]: True
In [4]: models.StagedContent.objects.filter(status=data.StagedContentStatus.EXPIRED).count()
Out[4]: 7
In [5]: models.StagedContent.objects.filter(status=data.StagedContentStatus.EXPIRED.value).count()
Out[5]: 7

Thus, my recommendation is to go with Django TextChoices for now until Python 3.11 and then switch to StrEnum.

openedx-webhooks commented 1 year ago

@bradenmacdonald 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

itsjeyd commented 1 year ago

label: core contributor