Closed adberglund closed 5 years ago
For AWS, the uniqueness constraints are:
So, for example, there should be exactly 1 line item for (13 Mar 2018 12:00-13:00, ec2, us-east-1a, RunInstances-m4.xlarge)
The main question in terms of de-duplication is, when we see a duplicate line with a different charge, do we ignore or replace the line in our DB?
For the month-end finalization, I'd think we'd want to repace/update rows rather than just ignore-and-discard them.
[2018-12-04 10:51:13,895: ERROR/ForkPoolWorker-1] masu.processor.tasks.process_report_file[f2fff33a-6021-4b15-a015-e036c08ec8c9]: ON CONFLICT DO UPDATE command cannot affect row a second time HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
So when we have dup rows in a sheet we see that error and nothing gets imported.
@adberglund Is that the expected outcome?
@lcouzens and I spoke about this, and it is not expected outcome. The example error was while processing OCP and I believe represents a bug that needs investigation.
Seems this is working as expected on AWS so this is just a potential bug/issue with OCP.
Verified Commit: 3011d8e0631a72809f070c36de0e533d89ac0ade Added multiple duplicate lines to a CSV and verified that each line was only processed once.
User Story
As a koku user, I do not want duplicate data represented in my dashboard so that I have an accurate representation of my usage and billing.
Impacts
Masu Backend
Role
backend engineer
Assumptions
Implementation Details
ON CONFLICT DO NOTHING
clause See https://www.postgresql.org/docs/9.6/static/sql-insert.html for explanationAcceptance Criteria