One of the awesome things about pydantic is the ability to convert input types that differ with the field type with a cast or conversion. For example, if my field is of type datetime and I pass in a float/int, it will be treated as a unix epoch and converted to a datetime type accordingly. Similarly, if a timestamp string such as 2023-09-22T12:30:01Z is passed to this field, it will also be converted to a datetime type accordingly.
Mac Addresses are realistically often represented as a sequence of 6 bytes in packet headers. In my case I've written some code to unpack the header of a Layer 2 Ethernet Frame https://en.wikipedia.org/wiki/Ethernet_frame#Structure. However, I must first transform mac_destination and mac_source before constructing my model.
import struct
from enum import IntEnum
from typing import Final
from pydantic import BaseModel
from pydantic_extra_types.mac_address import MacAddress
class EtherType(IntEnum):
IPV4 = 0x0800
IPV6 = 0x86DD
class Layer2EthernetHeader(BaseModel):
mac_dst: MacAddress
mac_src: MacAddress
ethertype: EtherType
size_t: Final[int] = 14
def mac_str(mac_bytes: bytes) -> str:
return ":".join(f"{b:02x}" for b in mac_bytes)
def decode_layer2_ethernet_header(data: bytes, index: int = 0) -> Layer2EthernetHeader:
mac_destination, mac_source, ethertype = struct.unpack(">6s6sH", test)
return Layer2EthernetHeader(
mac_dst=mac_str(mac_destination),
mac_srt=mac_str(mac_source),
mac_srt=ethertype,
)
I propose that during validation, MacAddress performs a check for non-str input types, and handles them accordingly. Specifically iterables types of length 6 (bytes, bytearray, List[int], NDArray[int], ...). Below is a some code that could accomplish this (have also added a conversion from an int, but this might not be an appropriate representation of a Mac Address):
def _validate(cls, __input_value: Union[str, Sequence[int]], _: Any) -> str:
if isinstance(__input_value, int):
__input_value = [0xff & (__input_value >> (i*8)) for i in range(6)]
if not isinstance(__input_value, str) and len(__input_value) == 6:
__input_value = ":".join(f"{b:02x}" for b in __input_value)
elif isinstance(__input_value, str):
pass
else:
raise TypeError(
f"Input must be str of length 14, or Sequence[int] of length 6. Got: {__input_value}."
)
return cls.validate_mac_address(__input_value.encode())
Furthermore, for IP addresses pydantic uses the standard library's IP Address implementation, which stores the IP address as an int internally, but presents the human-readable format with the __str__ method. Would it make sense to store Mac Addresses in a Sequence[int] format behind the scenes, and implementing the human-readable colon separated bytes as __str__?
Support Casting/Conversion of More Input Types
One of the awesome things about
pydantic
is the ability to convert input types that differ with the field type with a cast or conversion. For example, if my field is of typedatetime
and I pass in a float/int, it will be treated as a unix epoch and converted to adatetime
type accordingly. Similarly, if a timestamp string such as2023-09-22T12:30:01Z
is passed to this field, it will also be converted to adatetime
type accordingly.Currently the
MacAddress._validate
class method only supports inputs of typestr
with length 14. https://github.com/pydantic/pydantic-extra-types/blob/843b753e9e8cb74e83cac55598719b39a4d5ef1f/pydantic_extra_types/mac_address.py#L41-L42Mac Addresses are realistically often represented as a sequence of 6 bytes in packet headers. In my case I've written some code to unpack the header of a Layer 2 Ethernet Frame https://en.wikipedia.org/wiki/Ethernet_frame#Structure. However, I must first transform
mac_destination
andmac_source
before constructing my model.I propose that during validation,
MacAddress
performs a check for non-str
input types, and handles them accordingly. Specifically iterables types of length 6 (bytes
,bytearray
,List[int]
,NDArray[int]
, ...). Below is a some code that could accomplish this (have also added a conversion from anint
, but this might not be an appropriate representation of a Mac Address):Furthermore, for IP addresses
pydantic
uses the standard library's IP Address implementation, which stores the IP address as anint
internally, but presents the human-readable format with the__str__
method. Would it make sense to store Mac Addresses in a Sequence[int] format behind the scenes, and implementing the human-readable colon separated bytes as__str__
?Edits: Typos.