microsoft / finops-toolkit

Tools and resources to help you adopt and implement FinOps capabilities that automate and extend the Microsoft Cloud.
https://aka.ms/finops/toolkit
MIT License
282 stars 90 forks source link

Ingestion issues - duplicate data - strange results in PowerBI reports #966

Closed psilantropy closed 1 week ago

psilantropy commented 1 week ago

🐛 Problem

I've deployed FinOps hub a few times. I think going from 0.2 to .0.3 RC, then 0.3 0.4 and now 0.5.

When I look at my reports I have some strange data that must be the result of bad imports into the storage account. Wondering if this is a hole I can dig myself out of - but really I could just wipe the entire setup and redeploy.

It would be nice to retain my original historical imports, as I think this data would be lost due to age?

👣 Repro steps

I have these cost exports which I created back in February with 1.0-preview(v1).

image

My main daily FinOpsHubMonthtodate export

image

msexports storage account

image

msexports/account/ - Duplicate data or ok?

image image

Data factory triggers image

Data factory linked services (deploy failed to clean up??)

image

Pipelines

image

🤔 Expected

Probably not worth the time digging this detail up for screenshots but I have very inaccurate cost details vs Azure Cost Analysis. Some months look ok, some look way too high.

My ingestion powerbi report had strange details, but now on 0.5 it doesn't work at all. Keeps all the 'demo data'.

flanakin commented 1 week ago

@psilantropy You shouldn't see any duplication as of 0.4, which changed to use the manifest for identifying export scopes. To confirm, you'd need to look at the parquet files in the ingestion container. If you see any folders that duplicate other folders, then that will cause data duplication because the pipelines copy to a destination folder. I suspect the cases where you might be seeing that are from before 0.4 because of how we identified scopes based on the export path.

The easiest way to fix that is obviously to clear the ingestion container and re-export data for up to the last 7 years via the Start-FinOpsCostExport PowerShell command. That said, you shouldn't need to do that.

I'm not sure if we have a way to identify duplication cleanly, but we can definitely put some thought into it. I could see that going into the Data ingestion report to facilitate troubleshooting like this.

psilantropy commented 1 week ago

Hi @flanakin, thanks for the information and advice.

image

My inactive (single use exports) exports look ilke this; image

My month to date export, untouched since Feb-24

Good feb image

March ok with 2 files?

image

image

All folders up to August look like this with 2x parquet files.

Then a new folder structure as per the below;

August (again) image

September image

Honestly I don't care if I have to export again to get this clean. So delete these two folders?

image

Delete/stop my cost exports job? (finopshubmonthtodate)

Then run ?

Start-FinopsCostExport -Name 'CostExport' -Backfill 24

flanakin commented 1 week ago

@psilantropy Those very short paths look suspicious, but it's hard to say without seeing the values. Let's setup some time so we can look at them together. Can you email ftk-support@ms so we can setup some time?

I also have a guess as to where and possibly why the duplication is happening.

@MSBrett, I suspect our process for deleting files isn't accounting for cases where the new export has fewer files or maybe just different file names because of Cost Management changing file names. We probably need to clear the entire month when processing new months. Let's chat about that in our sync next week.

psilantropy commented 1 week ago

Happy to do that, but only if you gain something from reviewing it. More than happy to wipe it out.

flanakin commented 1 week ago

Short answer: Keep "providers" and delete the other, then backfill 1 export. Don't forget to set the correct export name and specify the scope of the billing account in the command call: Start-FinOpsCostExport -Name 'finopshubmonthtodate' -Scope '/providers/Microsoft.Billing/billingAccounts/###' -Backfill 24

Long answer...

I'm assuming you only have one billing account. If that's true, then you should only see one path to parquet files in the ingestion container: providers/microsoft.billing/billingaccounts/###/focuscost/yyyyMM/*.parquet

If you have any others, check the months to make sure you don't have duplicate months. If you do, delete the other folder and keep the one at the path above.

The one other thing I'm noticing in your files that seems odd is that you seem to have one parquet file created on the first and another created later in the month. That seems suspicious. Without confirming, I almost wonder if there's a different file naming pattern for daily vs. monthly exports, which could be causing 2 versions of the file to be created. Hubs overwrites files based on name and would miss this case if the same file had a new name, which would definitely result in duplicated files. But that's just a guess at this point...

psilantropy commented 1 week ago

I just wiped it all. Had some issues importing after the delete. Ingestion report looks ok now. Folder structure clean. CostSummary report looks good. RateOptimization report has a null value error, but i'll review and log after reviewing the cost summary in detail.

Appreciate your assistance.