php-kafka / php-avro-schema-generator

PHP avro subschema merger and experimental PHP Class avro schema generator
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Please allow types re-definition (custom mapping) #33

Open Hubbitus opened 2 years ago

Hubbitus commented 2 years ago

Now most interesting for me to define optional (nullable fields), so, instead of just:

"type": "string",

Allow to set something like:

"type" : [ "null", "string" ],
"default": null

But there are also more interesting cases when e.g. instead of just int I would like to use logicalType date.

Please allow providing some functional interface to override fixed mechanism by providing an array of mappers, or just callable.

nick-zh commented 2 years ago

Thanks for the input, I'll look into it ✌️ Do you need this in the merger or generator context?

nick-zh commented 2 years ago

So the generator is still a bit experimental, so i think the following things are missing right now (you pointed out most of them):

IMHO it would make sense to have doc comments to be able to define them, what do you think? I would propose:

In terms of types, i think unions are not properly supported yet, so i will enhance that. I think i will do it with #26, since it will make some of these things easier

Additionally, i agree that it would be nice if one could use an own Parser that complies with a newly introduced ParserInterface

Hubbitus commented 2 years ago

Do you need this in the merger or generator context?

I primarily use generators.

And honestly, I need generation schema by a particular object, not for all classes.

Hubbitus commented 2 years ago

IMHO it would make sense to have doc comments to be able to define them, what do you think? I would propose:

Good proposal. That may be useful. Meantime my intention was to provide some schema generation configuration, instead of changing classes.

In my use case, I need to generate AVRO schema for PHP generated classes (PIMCore). So, generally, I have no control over classes themselves but want to change generated schema. E.g. take PHP doc field comment into AVRO doc.

nick-zh commented 2 years ago

Thx for the insight, i will factor this in :v:

nick-zh commented 2 years ago

So i've created a pre-release for this new functionality. I still need to do some more testing, but if you want you can already check it out. Basically you are able to write your own ClassPropertyParserInterface which i think is what you want. Additionally you need your own AppContainer and console file, so you can register your own property parser. If you do it directly in PHP, you can check the readme or the example folder :v:

Hubbitus commented 2 years ago

@nick-zh, thank you very much! It looks very promising!

Hubbitus commented 2 years ago

The concept itself looks good. Thank you! But could you please make some enhancements:

  1. In ClassPropertyParser::parseProperty($property) provide some context. E.g. class file, or just pass the reference to ClassParser caller?
  2. Allow skipping such property in the scheme (some hidden, implementation details, transient). For example by throwing something like SkipPropertyException, or simply return null or false?
Hubbitus commented 2 years ago
  1. Why PhpClassPropertyInterface::NO_DEFAULT = 'there-was-no-default-set' has so strange string value, used then in PhpClassProperty constructor by default?

I would like to suggest making the default value null by default.

Hubbitus commented 2 years ago
  1. Why PhpClassConverter::getConvertedProperties silently skips properties which type is not string?? What if I already return there array like ['null', 'string'] from an one of the ClassPropertyParser::parseProperty($property) descendant implementation? Instead of skipping I would like to just return it as is. What you think?
Hubbitus commented 2 years ago
  1. PhpClassConverter is one or the converter implementation and intended to be sub-classed, is not? Why most methods has private modifier and not at least protected?
Hubbitus commented 2 years ago
  1. PhpClassConverter::getConvertedUnionType logic looks strange:
        if (0 !== count($convertedUnionType) && [] !== $arrayType) {
            $convertedUnionType[] = $arrayType;
        } elseif (0 === count($convertedUnionType) && [] !== $arrayType) {
            return $arrayType;
        }

    Why if $arrayType is empty array we return it? Looks like a mistake. And all else branch must be ommited.

Hubbitus commented 2 years ago
  1. You hardcode types to skip in PhpClassConverter::typesToSkip and include there also type null. But allow provide null as default value. That will lead to generating in AVRO scheme fields like:
    {"name":"fullName","type":["string"],"default":null}

    said by @var null|string or ?string. But that is totally incorrect by AVRO specification, citing:

    Union types are denoted as union { typeA, typeB, typeC, ... }. For example, this record contains a string field that is optional (unioned with null).

Also there in spec note what if there present default value, that must match the first type:

(Note that when a default value is specified for a record field whose type is a union, the type of the default value must match the first element of the union. Thus, for unions containing "null", the "null" is usually listed first, since the default value of such unions is typically null.)

So, that is logically incorrect to exclude null-type as that is legitime in AVRO.

I would like to propose:

  1. Exclude null type from skipped. Or at least allow overriding that array (e.g. parameter in constructor and/or setter)
  2. On generation phase check that on manner:

    diff --git a/src/Generator/SchemaGenerator.php b/src/Generator/SchemaGenerator.php
    index 9e99bc6..81fbaee 100644
    --- a/src/Generator/SchemaGenerator.php
    +++ b/src/Generator/SchemaGenerator.php
    @@ -5,6 +5,7 @@ declare(strict_types=1);
    namespace PhpKafka\PhpAvroSchemaGenerator\Generator;
    
    use PhpKafka\PhpAvroSchemaGenerator\Avro\Avro;
    +use PhpKafka\PhpAvroSchemaGenerator\Exception\SchemaGeneratorException;
    use PhpKafka\PhpAvroSchemaGenerator\PhpClass\PhpClassInterface;
    use PhpKafka\PhpAvroSchemaGenerator\PhpClass\PhpClassPropertyInterface;
    use PhpKafka\PhpAvroSchemaGenerator\Registry\ClassRegistryInterface;
    @@ -106,6 +107,9 @@ final class SchemaGenerator implements SchemaGeneratorInterface
    
         if (PhpClassPropertyInterface::NO_DEFAULT !== $property->getPropertyDefault()) {
             $field['default'] = $property->getPropertyDefault();
    +            if (null === $field['default'] && (!is_array($field['type']) or empty($field['type']) or 'null' != @$field['type'][0])){
    +                throw new SchemaGeneratorException('Provided default value "null", but that type is not provided as first possible in union (see https://avro.apache.org/docs/current/spec.html#Unions)!');
    +            }
         }
    
         if (null !== $property->getPropertyDoc() && '' !== $property->getPropertyDoc()) {
nick-zh commented 2 years ago

Hey @Hubbitus Thanks for your answers and requests, i'll try to answer:

  1. This was left open on purpose and only my implementation checks for PhpParser\Node\Stmt\Property so if you want to create your own class and stick to the interface, you are not forced to use nikic/php-parser like i do in my implementation
  2. I am not sure if we need to add this in my base implementation, since you can also easily create your own ClassParser that can skip properties. Could you elaborate maybe more what kind of properties and why you would want to skip?
  3. The problem is also a bit with #43 it is hard to represent an empty string in annotations and since i also need to support null itself at somepoint, that probably won't do
  4. Due to the fact that i am offering interfaces to give the user to override certain parts i also need some safeguards for my classes that for cases that i did not implement. While i myself return always string for PhpClassPropertyInterface::getPropertyType other people might want to maybe return different things. I could make the the interface more strict, to only allow string return, then i would not need to skip other types than string.
  5. I was a bit torn, normally maintainers often make classes final to prevent extending, so you don't get issues that have nothing to do with implementation. I agree, i should either make the class final or add protected, i am still a bit undecided.
  6. This logic is a bit more complex, agreed, the first if would add an array type { type: array, items: string } to other basic types (int, string, etc) if present. If the type is only an array union type { type: array, items: int,string } we would return it in the elseif. I hope that makes sense
  7. Thank you very much, total oversight on my end, null should indeed not be skipped and needs to be first type in that case if we have a union type :smile:

Many thanks for having a look at the new implementation and challenging it :pray: i appreciate this a lot

nick-zh commented 2 years ago

FYI: Null is now allowed (alpha3), i will add the order check separately. I am using a different lib for schema check / deployment, but i agree, it should be a part of this project as well

Hubbitus commented 2 years ago
  1. Unfortunately problem there what that method declared in the public interface \PhpKafka\PhpAvroSchemaGenerator\Parser\ClassPropertyParserInterface, and although you are right, there passed property of \PhpParser\Node\Stmt\Property (that extends \PhpParser\Node\Stmt <- \PhpParser\NodeAbstract). That implements \PhpParser\Node. I see there I am able to use file start and end positions, but can't even understand in which file! Have I looked badly? I think there will be very good to have a parent node specification in the nodes hierarchy. The main purpose there - I want to see other sources of properties descriptions and typing. For example:

    • To find getter end setter and look on them in reflection (or by parsing)
    • In my case (I have already mentioned I have classes generated by PIMCore) I have also in file a prolog comment which looks like this:

      /**
      * Inheritance: no
      * Variants: no
      
      Fields Summary:
      - code [input]
      - name [input]
      - fullName [input]
      - masterSystem [multiselect]
      - roleSubject [select]
      - roleObject [multiselect]
      - topic [multiselect]
      - description [textarea]
      */

      I've placed one example.

  2. ?
  3. Honestly it is not so important for me but looks very ugly. What is the problem to provide @avro-default '', @avro-default null, @avro-default 'null'??
  4. As I see you made PhpClassConverter class final. Is it unintentional? It looks very convenient to just extend it and override said single method parseProperty($property). Why not? Because it is a single-provided implementation you can't threat that as non-public internal details. Do you call it just "demo-implementation" and provide only API on interfaces?
  5. I would like to introduce there stricter type interface like PhpClassPropertyTypeInterface. Please look at the pull request https://github.com/php-kafka/php-avro-schema-generator/pull/47/

In that PR I also adjust most of the autotests, but not all. Some of them failed because of the step to reflection which we discussed separately (https://github.com/php-kafka/php-avro-schema-generator/issues/38). But I could continue work if you like the overall approach. Please look.

P.S. And @nick-zh, really big thank you for the fast responses and spent time! I've appreciate that.

nick-zh commented 2 years ago

To find getter end setter and look on them in reflection

This seems very specific and i don't think it is something the default implementation should cover or be able to extend upon. I think in this case it would be better for you to copy the exsiting ClassParser and extend it's functionality in your project

I have also in file a prolog comment which looks like this

This seems again very specific

The spirit of the generator is, you have some properties and some comments / types defined for these properties, and we generate an avro schema for that. Any logic beyond that will be hard to cover in a general way, since projects differ a lot in the real world and how people tend to write their classes. If there is anything i can do to make extension easier though, i am totally open to that as long as it is not too specific, e.g. see proposal in #49

I moved the other questions to discussions, it is getting a bit overwhelming otherwise and these are good points, but more on a general level for this project :smile:

Hubbitus commented 2 years ago

Sure such logic is very project-specific!!! I did not ask to add parsing types by seeing in getters in core implementation! I've asked to just pass ClassParser in the interface! Please look idea in https://github.com/php-kafka/php-avro-schema-generator/pull/47/commits/1cc81b7a42e4506f584334429ab3fb5e8fa610b7

One note, please, src/Parser/AvroClassPropertyParser.php is just like demonstration how that may be used for override. That is not intended to be part of real PR.

Hubbitus commented 2 years ago

@nick-zh, please look I've fixed almost all tests, but there still 2 failed like:

PhpKafka\PhpAvroSchemaGenerator\Tests\Integration\Parser\ClassParserTest::testGetProperties
ParseError: syntax error, unexpected '|', expecting variable (T_VARIABLE)

/var/www/html/example/classes/SomeTestClass.php:83
/var/www/html/src/Parser/ClassParser.php:221
/var/www/html/src/Parser/ClassParser.php:158
/var/www/html/tests/Integration/Parser/ClassParserTest.php:56
phpvfscomposer:///var/www/html/vendor/phpunit/phpunit/phpunit:97

But that related to the \PhpKafka\PhpAvroSchemaGenerator\Example\SomeTestClass::$blaaaaaaaa declaration:

public int|string $blaaaaaaaa;

That use Union type declaration which is available since PHP 8.0.0. So they failed expectedly on PHP container based on version 7.4. Should I upgrade the container to version 8.0.0 or do I just need to disable 2-3 tests? FYI all tests passed on my machine locally with PHP 8.0.2.