project-koku / masu

This is a READ ONLY repo. See https://github.com/project-koku/koku for current masu implementation
GNU Affero General Public License v3.0
5 stars 6 forks source link

Deduplicate cost entry line items when processing #53

Closed adberglund closed 5 years ago

adberglund commented 6 years ago

User Story

As a koku user, I do not want duplicate data represented in my dashboard so that I have an accurate representation of my usage and billing.

Impacts

Masu Backend

Role

backend engineer

Assumptions

Implementation Details

Acceptance Criteria

blentz commented 6 years ago

For AWS, the uniqueness constraints are:

  1. time period (1 hour resolution)
  2. region / AZ
  3. service
  4. resource / line item description

So, for example, there should be exactly 1 line item for (13 Mar 2018 12:00-13:00, ec2, us-east-1a, RunInstances-m4.xlarge)

The main question in terms of de-duplication is, when we see a duplicate line with a different charge, do we ignore or replace the line in our DB?

For the month-end finalization, I'd think we'd want to repace/update rows rather than just ignore-and-discard them.

chargio commented 6 years ago
  1. The file is final (it has an invoice number): there is an error, that can not happen --> notify the customer
  2. The file being processed is newer, there has been some processing after the last time we read it. Update it.
  3. The file being processed is older, drop it.
lcouzens commented 5 years ago

[2018-12-04 10:51:13,895: ERROR/ForkPoolWorker-1] masu.processor.tasks.process_report_file[f2fff33a-6021-4b15-a015-e036c08ec8c9]: ON CONFLICT DO UPDATE command cannot affect row a second time HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

So when we have dup rows in a sheet we see that error and nothing gets imported.

@adberglund Is that the expected outcome?

adberglund commented 5 years ago

@lcouzens and I spoke about this, and it is not expected outcome. The example error was while processing OCP and I believe represents a bug that needs investigation.

lcouzens commented 5 years ago

Seems this is working as expected on AWS so this is just a potential bug/issue with OCP.

lcouzens commented 5 years ago

Verified Commit: 3011d8e0631a72809f070c36de0e533d89ac0ade Added multiple duplicate lines to a CSV and verified that each line was only processed once.