Closed hungEA closed 1 year ago
@ayyubibrahimi Could you please check the process of generating event data? There seems to be a violation of unique constraint for event_uid.
@baoea there is a unique constraint built into the Builder
class for the events module lib/events.py line 419, but I don't believe that you are accessing this module.
This process-data error is strange. In this processing repo, an error raised when there are duplicate event_uid
at the fuse_agency stage, which prevents the processing repo from completing successfully. Additionally, I dropped any duplicate event_uid
(just in case) before the final event
table is output, but this process-data failure persists. Can you say more about which iteration of the event
table is being fetched prior to the error?
@ayyubibrahimi We're certainly not using your lib/events.py. The one that raised the error was in our validator https://github.com/ipno-llead/processing/blob/enhancement/data-validator-update/data-validator/event_importer.py
. The issue here is that our validator did not add extra data to your processed event.csv
, but when we imported the event.csv
in the fuse
folder to a temporary database, it threw error about unique constraint violation.
File "/runner/_work/processing/processing/data-validator/data_validator.py", line 86, in run_validator
module.run(conn, df, be_cols)
File "/runner/_work/processing/processing/data-validator/event_importer.py", line 167, in run
cursor.copy_expert(
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "officers_event_event_uid_acf5b8ca_uniq"
DETAIL: Key (event_uid)=(76a723fd1140db6abd7f0db0f53d43f2) already exists.
CONTEXT: COPY officers_event, line 4617
So please double-check the output of the data in the fuse/event.csv for this event_uid=76a723fd1140db6abd7f0db0f53d43f2
as we have no way to check it before it's push to WRGL.
Hi @hungEA, both on my local and on wrgl there is only one entry for the event_uid=76a723fd1140db6abd7f0db0f53d43f2
. I updated the fuse stage here so that duplicate event_uid
are dropped before the fuse/event.csv
table is generated, but as we see the error is still thrown which is why I asked about where this table was being fetched from, because the current code does not allow for duplicates.
@ayyubibrahimi We have fixed our brady_uid
constraint in the BE schema. Please have a look at the error of duplicated brady_uid
in brady
table.
Review data changes at tx/9e0cf141-49db-4e68-a382-0c034eadb76d
When this PR is merged, this transaction will be applied.
Transaction tx/9e0cf141-49db-4e68-a382-0c034eadb76d applied.
The following changes to
data-validator
are applied: