plone / plone.supermodel

Provides XML import and export for schema interfaces based on zope.schema fields
5 stars 8 forks source link

utils.mergedTaggedValueList returns duplicates #41

Closed gbastien closed 2 years ago

gbastien commented 2 years ago

Hi @mauritsvanrees @ale-rt @jensens

it seems that utils.mergedTaggedValueList returns duplicates. This is due to schema.__iro__ having the generated schema interface and other interfaces, so fields are found in several different interfaces. This was causing a bug in https://github.com/collective/collective.dexteritytextindexer/issues/38 This will be fixed in collective.dexteritytextindexer but it is a workaround, probably the problem should be fixed in utils.mergedTaggedValueList that should check if value not already in list before adding it.

What do you think?

It seems that this does not happen on Plone4 (probably but would need to check again) but this is happening with Plone 6.0.0a2.

Does that make sense that I propose a test and a PR? I do not know if it is supposed to holds several times same values, I do not know this package enough.

Thank you for all you do in Plone and happy new year 2022!

Gauthier

jensens commented 2 years ago

Usually it is not that a good idea to have two interfaces (generated or not) with the same fieldname. IIRC mergedTaggedValueList takes inherited/overridden fields into account. But - if it does not yet exist - a test would be good to proof it.

mauritsvanrees commented 2 years ago

But - if it does not yet exist - a test would be good to proof it.

Agreed.

gbastien commented 2 years ago

Hi @jensens @mauritsvanrees

thank you for feedback, I will add a failing test exposing this problem.

Moreover to be clear, when iterating over __iro__, mergedTaggedValueList will iterate the schema generated interface (so something like <InterfaceClass plone.dexterity.schema.generated.Plone_5_1640858254_2_9089332_0_MyContent>) that holds fields with tagged values and the original interface into which fields are defined, for exemple IMyContent that also holds the fields with tagged values. So it is not a problem of having various interface with same field names, it is that finally generated interface will "duplicate" original FS interface fields. So self.context.portal_types['MyPortalType'].lookupSchema().__iro__ will have following result: ``(,

, ...)`` And so fields of IMyContent are in generated schema as well. I'll come back to you soon with a test showing that. Gauthier
wesleybl commented 2 years ago

@gbastien could this problem be making performance worse?

gbastien commented 2 years ago

Hi @wesleybl

we had the problem with collective.dexteritytextindexer making it index fields 2 times and resulting in a SearchableText index having values duplicated so yes this leads to performances problems.

I some huge application, we use collective.solr because the SearchableText index is a bottleneck so yes indexing "2 times" will make it worse... The problem is fixed in collective.dexteritytextindexer (using a workaround) but the origin of the problem seems to be into mergedTaggedValueList, I will had a test today try to show that.

Thank you for follow up Gauthier

gbastien commented 2 years ago

Hi @jensens @mauritsvanrees

I added https://github.com/plone/plone.supermodel/commit/527a8dd6ada010467a348c58747fc32ee7a9d8be to show this.

Finally I suppose this is "normal" and every callers should pay attention that there may be duplicates.

To me the "biggest" problem is that there will always be duplicates when passing the FTI schema as this is a "copy" of the fields of every interfaces/behaviors/...

I do not know if needed to go further, the problem is fixed in collective.dexteritytextindexer, and the way mergedTaggedValueList could probably not be changed.

Thank you for follow up.

Gauthier

jensens commented 2 years ago

I agree. I had a further look: the doc-string states the nature of the result.

[...] Return a list that consists of
all elements from all interfaces and base interfaces, with values from
more-specific interfaces appearing at the end of the list.

So, mergedTaggedValueList behaves as described and I do not considers this a bug. And as the name suggests it merges all of them.

If you need different behavior, like getting the most recent or combined values, it's upon the caller to do so.

I think this need to be solved in the collective.dexteritytextindexer, and it could be mergedTaggedValueList is the wrong tool here. It is probably easier to iterate the schemas like with https://github.com/plone/plone.dexterity/blob/e43f08b68a46286ec12423991237c60d102af4fe/plone/dexterity/utils.py#L62 and then for each look in its IRO ischema = reversed(schema.__iro__) for the tagged value in question with ischema.queryDirectTaggedValue(KEY). If a field was already processed, skip it, as the first wins (or does the last? - not sure here) .