Closed dtroelofsprins closed 3 years ago
After an inventory two reason why the deletion or records from the published ERIC tables are identified: 1: the return in the _get_production_ids (publisher.py):
return {row["id"] for row in rows if row.get("national_node", "") == node.code}
never returns any records as the row.get("national_node", "") returns a dictionary. This means that all no deleted records can be identified. This should solve the issue:
return {row["id"] for row in rows if row.get("national_node", {}).get("id", "") == node.code}
2: The current get_qualities function (bbmri_client.py) returns ALL biobank and collection IDs, also the ones without a quality. Biobank and collections with a quality are not deleted, but as the function returns all biobanks and collections never any biobank or collection will be deleted as they all seem to have a quality. This of course is not true.
def get_quality_info(self) -> QualityInfo:
"""
Retrieves the quality information identifiers for biobanks and collections.
:return: a QualityInfo object
"""
biobank_qualities = self.get(
"eu_bbmri_eric_biobanks", batch_size=10000, attributes="id,quality"
)
collection_qualities = self.get(
"eu_bbmri_eric_collections", batch_size=10000, attributes="id,quality"
)
biobanks = utils.to_upload_format(biobank_qualities)
collections = utils.to_upload_format(collection_qualities)
return QualityInfo(
biobanks={row["id"]: row["quality"] for row in biobanks},
collections={row["id"]: row["quality"] for row in collections},
)
Only the rows that have a quality should be returned => therefore not select from eu_bbmri_eric_biobanks and eu_bbmri_eric_collections, but from eu_bbmri_eric_bio_qual_info and eu_bbmri_eric_col_qual_info instead:
def get_quality_info(self) -> QualityInfo:
"""
Retrieves the quality information identifiers for biobanks and collections.
:return: a QualityInfo object
"""
biobank_qualities = self.get(
"eu_bbmri_eric_bio_qual_info", batch_size=10000, attributes="id,biobank"
)
collection_qualities = self.get(
"eu_bbmri_eric_col_qual_info", batch_size=10000, attributes="id,collection"
)
biobanks = utils.to_upload_format(biobank_qualities)
collections = utils.to_upload_format(collection_qualities)
bb_qual={}
{bb_qual.setdefault(row["biobank"], []).append(row["id"]) for row in biobanks}
coll_qual={}
{coll_qual.setdefault(row["collection"], []).append(row["id"]) for row in collections}
return QualityInfo(
biobanks=bb_qual,
collections=coll_qual
)
After these changes the function test_delete_rows (test_publisher.py) needed a fix: Changed:
{"id": "bbmri-eric:ID:NO_OUS", "national_node": "NO"},
{"id": "ignore_this_row", "national_node": "XX"},
{"id": "delete_this_row", "national_node": "NO"},
{"id": "undeletable_id", "national_node": "NO"},
into:
{"id": "bbmri-eric:ID:NO_OUS", "national_node": {"id": "NO"}},
{"id": "ignore_this_row", "national_node": {"id": "XX"}},
{"id": "delete_this_row", "national_node": {"id": "NO"}},
{"id": "undeletable_id", "national_node": {"id": "NO"}},
When records are deleted from the staging area they should also be removed from the published ERIC tables. This does not happen. It seems the _delete_rows function within the Publisher does not work properly.