INTERLIS Import: Data in re_tables is ducplicated when re-importing

sjib commented 2 months ago

Describe the bug A clear and concise description of what the bug is.

When re-importing the same dataset or another dataset in a existing database with data it does not check if there is already the same data in the re_tables. This leads to duplicates in re_maintenance_event_wastewater_structure and re_building_group_disposal.

To Reproduce Re-import a dataset with re_data a second time and check your re_table in postgres

Expected behavior A clear and concise description of what you expected to happen.

On import it should be checked as with all other tables if there is already a dataset with the same combination fk_1 / fk_2. The function create_or_update should be enhanced or an new function should be written that checks on the two fk_* attributes of a n:m re_table instead of the obj_id if there is already the same dataset in there.

Screenshots / data If applicable, add screenshots or data to help explain your problem.

Desktop (please complete the following information):

TWW version https://github.com/teksi/wastewater/pull/357
QGIS Version 3.38.4
OS Windows 10

Additional context Add any other context about the problem here.

Tests with create_re_class_entrydo not work as expected - moved to this new issue

[ ] How to get the classname out of variable cls? teksi_wastewater.utils:Not supported n:m relation class: - please open an issue to add this to interlis_importer_to_intermediate_schema configuration

So it does not find the cls

[ ] How to make this function more flexible in not having to hardcode the supported re_* classes?

    def create_re_class_entry(self, cls, **kwargs):
        """
        Checks if an existing instance (if fk_1, fk_2 combination is found) or creates an instance of the provided re_class
        with given kwargs, and returns it.
        """
        instance = None

        # We try to get the instance from the session/database
        # does not work as we can then not run instance = cls(**kwargs) to create object at the end
        # fk_1_value = kwargs.get("fk_1", None)
        # fk_2_value = kwargs.get("fk_2", None)

        logger.info(
                    f"Not supported n:m relation class: {cls} - please open an issue to add this to interlis_importer_to_intermediate_schema configuration"
                )

        if cls == "re_maintenance_event_wastewater_structure":
            fk_1_value = kwargs.get("fk_wastewater_structure", None)
            fk_2_value = kwargs.get("fk_maintenance_event", None)
        elif cls == "re_building_group_disposal":
            fk_1_value = kwargs.get("fk_disposal", None)
            fk_2_value = kwargs.get("fk_building_group", None)
        else:
            fk_1_value = None
            fk_2_value = None

            logger.warning(
                f"Not supported n:m relation class: {cls} - please open an issue to add this to interlis_importer_to_intermediate_schema configuration"
            )

        if fk_1_value and fk_2_value:
            # instance = self.session_tww.query(cls).get(kwargs.get("obj_id", None))
            # instance = self.session_tww.query(cls).get(kwargs.get("fk_1", "fk_2", None))

            # try with filter
            # filter(or_(db.users.name=='Ryan', db.users.country=='England'))
            # filter should be and not or see https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.Query.filter_by

            if cls == "re_maintenance_event_wastewater_structure":
                instance = self.session_tww.query(cls).filter_by(
                    fk_wastewater_structure=fk_1_value, fk_maintenance_event=fk_2_value
                )

            elif cls == "re_building_group_disposal":
                instance = self.session_tww.query(cls).filter_by(
                    fk_disposal=fk_1_value, fk_building_group=fk_2_value
                )

            else:
                logger.warning(
                    f"Not supported n:m relation class: {cls} - please open an issue to add this to interlis_importer_to_intermediate_schema configuration"
                )

        if instance is None:

            # We found it -> skip
            # instance.__dict__.update(kwargs)
            # flag_dirty(instance)  # we flag it as dirty so it stays in the session
            # else:
            # We didn't find it -> create
            instance = cls(**kwargs)

        return instance

and function call

    def _import_erhaltungsereignis_abwasserbauwerkassoc(self):
        for row in self.session_interlis.query(
            self.model_classes_interlis.erhaltungsereignis_abwasserbauwerkassoc
        ):
            # test with create_re_class_entry

            re_maintenance_event_wastewater_structure = self.create_re_class_entry(
                self.model_classes_tww_od.re_maintenance_event_wastewater_structure,
                # this class does not inherit base_commmon
                # **self.base_common(row),
                # --- re_maintenance_event_wastewater_structure ---
                # does not work
                # fk_1=self.get_pk(row.erhaltungsereignis_abwasserbauwerkassocref__REL),
                # fk_2=self.get_pk(row.abwasserbauwerkref__REL),
                fk_maintenance_event=self.get_pk(
                    row.erhaltungsereignis_abwasserbauwerkassocref__REL
                ),
                fk_wastewater_structure=self.get_pk(row.abwasserbauwerkref__REL),
            )

            self.session_tww.add(re_maintenance_event_wastewater_structure)
            print(".", end="")

Originally posted by @sjib in https://github.com/teksi/wastewater/issues/357#issuecomment-2273560604

sjib commented 2 months ago

eg. Line 2 and line 8 are the same data 20240807_re_me_wws_duplicate_data

sjib commented 2 months ago

[ ] Add unique constraint fk_1 / fk2 to re tables to improve data integrity

sjib commented 2 months ago

@ponceta Can we ask opengis to code this new create_re_class_entry ?

sjib commented 2 months ago

unique constraint fk_1 / fk2 to re tables

@cymed

Should we do for tww_od.re_building_group_disposal

[ ] create an index as with identifier, fk_dataowner: CREATE UNIQUE INDEX in_od_organisation_identifier ON tww_od.organisation USING btree (identifier ASC NULLS LAST, fk_dataowner ASC NULL LAST);

this would then be

CREATE UNIQUE INDEX in_re_building_group_disposal_fks ON tww_od.re_building_group_disposal USING btree (fk_building_group ASC NULLS LAST, fk_disposal ASC NULLS LAST);

20240807_UNIQUE INDEX in_re_building_group_disposal_fks

or

[ ] create a UNIQUE constraint: ALTER TABLE tww_od.re_building_group_disposal ADD CONSTRAINT unique_fk_combination UNIQUE (fk_building_group, fk_disposal);

20240707_re_building_group_disposal_unique_fk_combination

@cymed @ponceta Do you know what is the difference? Pro / Con?

sjib commented 2 months ago

Do we also want to implement the UNIQUE constraint of class organisation?


    UNIQUE 
        !!@comment = "!! Neben UNIQUE OID zusätzlich auch Kombination Bezeichnung, Organisationstyp, UID (Wegleitung GEP-Daten 2020)"
        Bezeichnung, Organisationstyp, UID; 
    END Organisation;

domi4484 commented 3 weeks ago

Data in re_tables is ducplicated when re-importing

This should not happen anymore since #414 was merged.

teksi / wastewater

INTERLIS Import: Data in re_tables is ducplicated when re-importing #365