kaliop-uk / ezmigrationbundle

This bundle makes it easy to handle eZPlatform / eZPublish5 content upgrades/migrations
GNU General Public License v2.0
53 stars 81 forks source link

Unable to set translations in a single step if an untranslatable field is involved #247

Open korsarNek opened 3 years ago

korsarNek commented 3 years ago

Hello,

when I'm trying to execute a migration step like this:

-
    type: content
    mode: create
    content_type: my_type
    parent_location: some_location
    attributes:
        translatable_field:
             eng-GB: some text
             ger-DE: another text
        not_translatable_field: not translatable value

I get an error depending on the type of my other attributes, something like Notice: Undefined index: destinationContentId in file /var/www/html/project/ezplatform/vendor/ezsystems/ezpublish-kernel/eZ/Publish/Core/FieldType/ImageAsset/Type.php line 199 for image or Warning: trim() expects parameter 1 to be string, array given in file /var/www/html/project/ezplatform/vendor/ezsystems/ezpublish-kernel/eZ/Publish/Core/FieldType/TextLine/Type.php line 165 for a text line.

With XDebug I found out that the problem is, that the translations don't get resolved, meaning the value that's given to a TextLine\Type object for example is an array that contains the original yml fields eng-GB and ger-DE as keys.

On the other hand, if I try to provide the translation too, like here:

-
    type: content
    mode: create
    content_type: my_type
    parent_location: some_location
    attributes:
        translatable_field:
             eng-GB: some text
             ger-DE: another text
        not_translatable_field:
             eng-GB: not translatable value
             ger-DE:  not translatable value

I get this error A value is set for non translatable field definition 'not_translatable_field' with language 'eng-GB' .... With some debugging, it seems because kaliop/ezmigrationbundle/Core/Executor/ContentManager.php hasLanguageCodesAsKeys function returns false if any key doesn't have a translation set.

For now my workaround is to create the content object in a single language and then use another migration step to add translations.

gggeek commented 3 years ago

That's correct. I think it is even documented: in a migration step either all fields must be multilingual, or none. Which means that your usecase of creating/updating in a single pass many languages of a content which has some fields set as untranslatable is not supported - but can be achieved with 2 migration steps.

The problem is that it is hard for the mig engine to decide if the data for a field is just a single-language value (ie. an array of strings in your case), or an array of values, one for each language.

Besides analyzing which fieldType correspond to the current field and making a decision based on that, I see no smart algorithm that would give enough assurance in deciding which is which. And analyzing field types is prone to breackage, with custom fieldtypes being available from the community or per-project.

One could push instead the languages to be top-level elements inside 'attributes, eg. attributes/eng-gb/field-one, but even in that case it would be hard to tell if 'eng-gb' is a lang name or field name.

ps: what about this? It seems like a config which it is safe to assume to be multilang, and could probably work

-
    type: content
    mode: create
    content_type: my_type
    parent_location: some_location
    attributes:
        translatable_field:
             eng-GB: some text
             ger-DE: another text
        not_translatable_field:
             ger-DE:  not translatable value
korsarNek commented 3 years ago

The lower example would suffice for me. At the moment it is not possible to specify the language code for a field at all that is not translatable, if I don't want to split the creation step. So, specifying only a single language currently doesn't work, I also tried that version.

I wasn't aware that a check on the field type is necessary. I guess I also used the wrong word, when I said field, I actually meant the attribute on the content_type, not the field type that an attribute can have. The errors on the field types only showed up when I didn't use a translation key everywhere.

The configuration on whether an attribute is translatable or not can be done on the attribute level. So when I create a new content type (e.g. via Kaliop), I can specify on the attribute, whether it should be translatable or not, which could be checked when executing a content migration.

I guess this could be extended to make it possible to use the default language if the key can't be identified as a language key. That would mean not having to specify the language key for every attribute, if multiple language should be used at all. But I don't know what the original motivation was to restrict it to all attributes or none.

If I have attributes that are the same in multiple languages, with that change, I might be able to specify them like this:

-
    type: content
    mode: create
    ....
     attributes:
          same_in_all_languages: some value
          different_per_language:
              eng-GB: an English title
              ger-DE: a German title

It would not matter whether same_in_all_languages is translatable or not. At least when writing the yml, it would probably be a separate case in the code.

This would create problems in case the name of a sub field of an attribute is the same as the name of a language, but I guess this edge case already exists anyway.

Just as an additional suggestion, the original suggestion to write a single language code for the untranslatable fields would suffice. It would create an inconsitency though, as the value specified is actually language independent, so connecting it to a language seems weird.

gggeek commented 3 years ago

Note: I am sticking to ez5 nomenclature for clarity, so no 'attribute' anywhere in this post please, even if the yaml says attributes :-) There are Content, Field, ContentType, FieldType.

So when I create a new content type (e.g. via Kaliop), I can specify on the attribute, whether it should be translatable or not, which could be checked when executing a content migration.

Ok, that seems a better idea than checking the FieldType.

But I don't know what the original motivation was to restrict it to all attributes or none.

I tried to explain it in my comment above: it is hard to tell if the data for a field is meant to be a single-language value or a multiple-languages value. Checking all the values defined in the yaml seemed to be the most 'safe' approach. I think that it is the possible problem that you found out by yourself and described as:

This would create problems in case the name of a sub field of an attribute is the same as the name of a language, but I guess this edge case already exists anyway.

(the answer being that atm this edge case is taken into account and is not a problem, unless every single field in your content accepts array values with keys which match language name format)

I agree that, depending on your mental model, and the design of the ez api (I don't have it in front of my eyes atm), either one of these might make more sense:

    attributes:
        translatable_field:
             eng-GB: 'some text'
             ger-DE: 'another text'
        not_translatable_field:
             ger-DE:  'not translatable value'

or

    attributes:
        translatable_field:
             eng-GB: 'some text'
             ger-DE: 'another text'
        not_translatable_field: 'not translatable value'

It seems that you'd favor the 2nd one. I agree that it seems clean and elegant. However, I do see many potential pitfalls here:

  1. how does the mig bundle know that you want to create content in 2 languages, compared to passing a single array vale to 'translatable_field' ?
  2. what should happen if translatable_field is in fact defined as non-translatable ? the bundle will try to use an array as value, which will probably make the ez api throw an error with a message not easy to decode by the developer...
  3. what should happen if translatable_field is in fact defined as translatable ? should it have the same value in all languages?
  4. what about BC considerations ?

I think that the first example is better in that it makes it easier to answer to question 1. The fact that you have to specify a language even for a non-translatable field can be mentally modeled with the fact that every content version still gets created with a default language anyway... It seems still far from perfect though - I will try to think about it a bit more and see if there is a better alternative. Maybe we could add a separate key: multilanguage: true, outside of attributes...

korsarNek commented 3 years ago
  1. how does the mig bundle know that you want to create content in 2 languages, compared to passing a single array vale to 'translatable_field' ?

that case would be handled similarly to now, if all the sub fields of a field have the name of a language code, it gets interpreted as a translation. It's just reduced to a single field, not all fields. So this combination would not be allowed:

attributes:
    translatable_field:
         my default language value
         eng-GB: a value for a specific language

Which wouldn't be valid yaml anyway. The following case could still be intepreted as just two sub fields with the names sub_field and eng-GB which happens to have another sub field of the name sub_field:

attributes:
    translatable_field:
        sub_field: my default language value
        eng-GB:
             sub_field: a value for a specific language
  1. what should happen if translatable_field is in fact defined as non-translatable ? the bundle will try to use an array as value, which will probably make the ez api throw an error with a message not easy to decode by the developer...

I guess the same thing would happen as now? A value is set for non translatable field definition 'not_translatable_field' with language 'eng-GB' ... Of couse a custom error message would be welcome too.

  1. what should happen if translatable_field is in fact defined as translatable ? should it have the same value in all languages?

Well, in the example, multiple different translations were given, so they would apply. In case only a single translation is provided, it would behave like now. If I don't provide a language key for some field, but for another field, just nothing gets written there. I guess the same behavior should be kept.

attributes:
    translatable_1:
        ger-DE: some German text
     translatable_2:
        eng-GB: some English text

translatable_1 would only be set in the german version, in the english version, no value would be set. For translatable_2 the other way around. If one of them was required, you get an error, like now.

Kaliop currently doesn't enforce that all fields have the same language keys, only that they have a language key.

  1. what about BC considerations ?

This could be considered a breaking change. But only in the case someone already had fields with sub fields that are only named like language keys. That person would've had already problems though if in a migration they only have that kind of field, then it currently already does not what would probably be expected in this case. After this, the edge case would be extended to be a problem when someone has any field whose sub fields are all equal to language keys. A problematic case where sub fields have the same name as a language key already exists.

Maybe we could add a separate key: multilanguage: true, outside of attributes...

What exactly would the new multilanguage field mean?


The extension to not having to write all translations for translatable fields without having to duplicate the value could be like this:

attributes:
   field_1: everywhere the same
   field_2:
      ger-DE: only German value
      eng-GB: only English value

which would be equal to this case with the current system:

attributes:
   field_1:
      ger-DE: everywhere the same
      eng-GB: everywhere the same
   field_2:
      ger-DE: only German value
      eng-GB: only English value

It would then have to select all the language keys that are provided in this step and extend the field to apply to all of them, if the field is translatable at least.

The good thing is, the upper example is currently not valid, so making it valid should be fine. You can still write it the old way if something more specific is wanted. The only breaking change is the case where a field has sub fields that all have the name of a language key. Edit: I mean, it would be valid, if you actually have a field who's sub fields are only language keys, it's a weird edge case ...

korsarNek commented 3 years ago

Well, I think what really prevents this enhancement from working out is that the language code and the sub fields live in the same namespace. So it can be difficult to separate what has been meant.

While this is already a problem in the already existing implementation, this new one would extend that edge case.

Maybe a special prefix can be used, to identify it as a sub field in case it could be ambiguous? With the default being, if it looks like a language key, it is language key. Would be a breaking change too, though. The default could also be that sub fields get priority but there is a prefix for the case of a language code. Breaking change, too.

gggeek commented 3 years ago

Well, I think what really prevents this enhancement from working out is that the language code and the sub fields live in the same namespace. So it can be difficult to separate what has been meant. While this is already a problem in the already existing implementation, this new one would extend that edge case.

That's the thing...

korsarNek commented 3 years ago

I don't think it is optimal to specify a single language key for an untranslatable field, too. Would it have to be the default language? Any language?

Introducing an additional flag to enable this, while possible, seems like it enables a deviation from the normal semantics that the yml file would have. So the fields change their meaning. Sounds like opening a can of worms, even if it may be backwards compatible.

I guess it is better to skip on the enhancement in this area then taking a sub optimal one. Maybe one day if in a new major version the namespace problem gets adressed, it could be back on the table.

Do you think any of the namespacing suggestions could become a candidate?

gggeek commented 3 years ago

Let me sit on this for a couple of days... (atm I am busy with work, as well as the port of mig bundle to ezplatform 3...)