Closed psilantropy closed 1 week ago
@psilantropy You shouldn't see any duplication as of 0.4, which changed to use the manifest for identifying export scopes. To confirm, you'd need to look at the parquet files in the ingestion
container. If you see any folders that duplicate other folders, then that will cause data duplication because the pipelines copy to a destination folder. I suspect the cases where you might be seeing that are from before 0.4 because of how we identified scopes based on the export path.
The easiest way to fix that is obviously to clear the ingestion container and re-export data for up to the last 7 years via the Start-FinOpsCostExport PowerShell command. That said, you shouldn't need to do that.
I'm not sure if we have a way to identify duplication cleanly, but we can definitely put some thought into it. I could see that going into the Data ingestion report to facilitate troubleshooting like this.
Hi @flanakin, thanks for the information and advice.
My inactive (single use exports) exports look ilke this;
My month to date export, untouched since Feb-24
Good feb
March ok with 2 files?
All folders up to August look like this with 2x parquet files.
Then a new folder structure as per the below;
August (again)
September
Honestly I don't care if I have to export again to get this clean. So delete these two folders?
Delete/stop my cost exports job? (finopshubmonthtodate)
Then run ?
Start-FinopsCostExport -Name 'CostExport' -Backfill 24
@psilantropy Those very short paths look suspicious, but it's hard to say without seeing the values. Let's setup some time so we can look at them together. Can you email ftk-support@ms so we can setup some time?
I also have a guess as to where and possibly why the duplication is happening.
@MSBrett, I suspect our process for deleting files isn't accounting for cases where the new export has fewer files or maybe just different file names because of Cost Management changing file names. We probably need to clear the entire month when processing new months. Let's chat about that in our sync next week.
Happy to do that, but only if you gain something from reviewing it. More than happy to wipe it out.
Short answer: Keep "providers" and delete the other, then backfill 1 export. Don't forget to set the correct export name and specify the scope of the billing account in the command call:
Start-FinOpsCostExport -Name 'finopshubmonthtodate' -Scope '/providers/Microsoft.Billing/billingAccounts/###' -Backfill 24
Long answer...
I'm assuming you only have one billing account. If that's true, then you should only see one path to parquet files in the ingestion container:
providers/microsoft.billing/billingaccounts/###/focuscost/yyyyMM/*.parquet
If you have any others, check the months to make sure you don't have duplicate months. If you do, delete the other folder and keep the one at the path above.
The one other thing I'm noticing in your files that seems odd is that you seem to have one parquet file created on the first and another created later in the month. That seems suspicious. Without confirming, I almost wonder if there's a different file naming pattern for daily vs. monthly exports, which could be causing 2 versions of the file to be created. Hubs overwrites files based on name and would miss this case if the same file had a new name, which would definitely result in duplicated files. But that's just a guess at this point...
I just wiped it all. Had some issues importing after the delete. Ingestion report looks ok now. Folder structure clean. CostSummary report looks good. RateOptimization report has a null value error, but i'll review and log after reviewing the cost summary in detail.
Appreciate your assistance.
🐛 Problem
I've deployed FinOps hub a few times. I think going from 0.2 to .0.3 RC, then 0.3 0.4 and now 0.5.
When I look at my reports I have some strange data that must be the result of bad imports into the storage account. Wondering if this is a hole I can dig myself out of - but really I could just wipe the entire setup and redeploy.
It would be nice to retain my original historical imports, as I think this data would be lost due to age?
👣 Repro steps
I have these cost exports which I created back in February with 1.0-preview(v1).
My main daily FinOpsHubMonthtodate export
msexports storage account
msexports/account/ - Duplicate data or ok?
Data factory triggers
Data factory linked services (deploy failed to clean up??)
Pipelines
🤔 Expected
Probably not worth the time digging this detail up for screenshots but I have very inaccurate cost details vs Azure Cost Analysis. Some months look ok, some look way too high.
My ingestion powerbi report had strange details, but now on 0.5 it doesn't work at all. Keeps all the 'demo data'.