Wrong parsing of repeated structured properties

googleapis / python-ndb

Apache License 2.0

150 stars 66 forks source link

from google.cloud import ndb client = ndb.Client() class ModelD(ndb.Model): field_d = ndb.TextProperty() class ModelC(ndb.Model): field_c_1 = ndb.StringProperty() field_c_2 = ndb.TextProperty() field_c_3 = ndb.TextProperty() field_c_4 = ndb.TextProperty() class ModelB(ndb.Model): field_b_1 = ndb.StringProperty() field_b_2 = ndb.StructuredProperty(ModelC) class ModelA(ndb.Model): field_a_1 = ndb.StringProperty() field_a_2 = ndb.StructuredProperty(ModelD) field_a_3 = ndb.StructuredProperty(ModelB, repeated=True) def load_data(): with client.context(): model_Cs = [ ModelC( field_c_1='value_c_1_1', field_c_2='value_c_2_1', field_c_3='value_c_3_1', field_c_4='value_c_4_1' ), ModelC( field_c_1='value_c_1_2', field_c_2='value_c_2_2', field_c_3='value_c_3_2', field_c_4='value_c_4_2' ) ] model_Bs = [ ModelB(field_b_1='b1', field_b_2=model_Cs[0]), ModelB(field_b_1='b2'), # NULL VALUE FOR field_b_2, This is the issue! ModelB(field_b_1='b3', field_b_2=model_Cs[0]) ] model_a = ModelA(field_a_1='a1', field_a_2=ModelD(field_d='d1'), field_a_3=model_Bs) model_a.put() return model_a.key.id() def assert_bug(): id_ = load_data() number_of_assertions_errors = 0 number_of_kind_errors = 0 for i in range(30): try: with client.context(): result = ModelA.get_by_id(id_, use_cache=False, use_global_cache=False) result1 = ModelA.get_by_id(id_, use_cache=False, use_global_cache=False) assert result == result1 except AssertionError: number_of_assertions_errors += 1 except ndb.KindError: number_of_kind_errors += 1 print(f'Number of assertions errors {number_of_assertions_errors}') print(f'Number of kind errors {number_of_kind_errors}') if __name__ == '__main__': assert_bug()

I have run into this issue too with a similarly complex entity with structured properties. I first ran into it due to the KindError, as well. I patched _entity_from_ds_entity to sort the fields of the dt_entity before iterating over them, and that seemed to prevent the KindError. (the datastore Entity type is based on dict, and in Python 2.7 dict is not order preserving, implying there are some odd cases/bad assumptions within _entity_from_ds_entity or the code it calls).

However, there is a deeper problem with the legacy data format in the case discussed here. In the example above, model_Bs[1] has field_b_2 set to None. This causes the old dot-notation representation to lose information about the correct order of the deeply nested (and repeated) fields.

Taking part of the construction of the example above:

 model_Bs = [  # stored as a repeated StructuredProperty
            ModelB(field_b_1='b1', field_b_2=model_Cs[0]),
            ModelB(field_b_1='b2'),  # NULL VALUE FOR field_b_2, This is the issue!
            ModelB(field_b_1='b3', field_b_2=model_Cs[1])
        ]

When the entity is re-fetch/get later, these will instead be loaded as:

 model_Bs = [  # stored as a repeated StructuredProperty
            ModelB(field_b_1='b1', field_b_2=model_Cs[0]),
            ModelB(field_b_1='b2', field_b_2=model_Cs[1])
            ModelB(field_b_1='b3', field_b_2=None),  # There were only 2 values in the various `field_b_2.*` fields
        ]

I believe this issue existed in the legacy NDB as well.

So far, my only plan to work around this is to reconstruct the original data, but with legacy_data=False, so that NDB will store the StructuredProperties as embeded entities, instead of trying to flatten them into the dot-notation lists: https://cloud.google.com/datastore/docs/concepts/entities#embedded_entity

googleapis / python-ndb

Wrong parsing of repeated structured properties #757