symfony / symfony

The Symfony PHP framework
https://symfony.com
MIT License
29.64k stars 9.42k forks source link

Validating invalid UTF8 string with the UniqueEntityValidator leads to SQL errors #57817

Open glaubinix opened 1 month ago

glaubinix commented 1 month ago

Symfony version(s) affected

6.4.9

Description

Validating with invalid UTF8 strings with the UniqueEntityValidator leads to SQL errors when using PostgreSQL (likely other SQL flavors too), as the validator passes the value straight to the ORM.

Uncaught PHP Exception Doctrine\DBAL\Exception\DriverException: "An exception occurred while executing a query: SQLSTATE[22021]: Character not in repertoire: 7 ERROR: invalid byte sequence for encoding "UTF8": 0xad"

How to reproduce

Have an entity with constraints like for instance the below entity

<?php declare(strict_types=1);

namespace App\Entity;

use Doctrine\ORM\Mapping as ORM;
use Symfony\Bridge\Doctrine\Validator\Constraints\UniqueEntity;
use Symfony\Component\Validator\Constraints as Assert;

#[ORM\Entity]
#[ORM\Table]
#[ORM\UniqueConstraint(name: 'some_unique_field', columns: ['name'])]
#[UniqueEntity(fields: ['name'], message: 'The name must be unique')]
class TestEntity
{
    #[ORM\Id]
    #[ORM\Column]
    #[ORM\GeneratedValue(strategy: 'AUTO')]
    private int $id;

    #[Assert\NotBlank]
    #[Assert\Length(min: 1, max: 60)]
    #[ORM\Column()]
    public string $name;
}

Setting the name to any invalid UTF8 string like in the example below and then validating it can/will result in an SQL error.

$object = new TestEntity;
$object->name  = 'Name' . chr(198);

$validator->validate($object);

When using PostgreSQL, but likely other SQL flavors too, this causes the error below

Uncaught PHP Exception Doctrine\DBAL\Exception\DriverException: "An exception occurred while executing a query: SQLSTATE[22021]: Character not in repertoire: 7 ERROR: invalid byte sequence for encoding "UTF8": 0xad"

When the UniqueEntityValidator calls findBy to check if there are any entities stored in the database.

Possible Solution

Would be really helpful to automatically have the UniqueEntityValidator reject any strings not matching the expected charset or have a Charset constraint that could be used to run right before the UniqueEntity

Additional Context

No response

n0rbyt3 commented 1 month ago

Symfony 7.1 introduces a new Charset constraint: https://symfony.com/doc/current/reference/constraints/Charset.html

If you need to stay at 6.4, you can write a custom validator that uses mb_detect_encoding($value, $encodings, $strict) internally.