grafana / unused

CLI tool, Prometheus exporter, and Go module to list your unused disks in all cloud providers
Apache License 2.0
52 stars 1 forks source link

cmd/unused will block if a disks metadata cannot be read #41

Closed Pokom closed 1 year ago

Pokom commented 1 year ago

When executing unused -gcp.project $(gcloud config get project), if one of the disks does not get read correctly, it will block the application printing out the rest. Here's an example with the project and disk x'ed out:

unused -gcp.project $(gcloud config get project)                                                                                                                                                                               
displaying output: GCP project=XXX: listing unused disks: reading disk metadata: cannot decode JSON description for disk XXXXXX: invalid character 'S' looking for beginning of value

I would expect the program to print out disks it was able to find that are unused and potentially print out disks it could not read.

Pokom commented 1 year ago

I think the root cause is that https://github.com/grafana/unused/blob/1c937765ab6bc529f3ed2f78c5e68db924f967ce/gcp/provider.go#L89 assumes that the description is a string that represents a json blob. There are instances where the string may just be a sentence, such as:

"Saved for https://raintank-corp.slack.com/archives//"

Other disks' Description looks like so:

{"kubernetes.io/created-for/pv/name":"pvc-<redacted>","kubernetes.io/created-for/pvc/name":"xxxx","kubernetes.io/created-for/pvc/namespace":"xxxxx"}

We should add better detection here to ensure that not only is description not equal, but also that it can be marshalled.

Pokom commented 1 year ago

Further analysis: A large majority of the unused disks were created by kubernetes, which will set the description to be json encoded as a string. The one disk that failed was manually created and had it's description set as just a plain old string. So in this scenario I'm not sure how we want to really handle the error. I was able to ignore the error and run the program successfully, but this doesn't seem right.

inkel commented 1 year ago

Yes, that is indeed the case, we assume these disks have a JSON string in its description. I suppose we could probably add an error field to unused.Disk and have that reported. Having said that, perhaps for an MVP release what we could do is to just simply log the error and continue processing disks 🤔

Pokom commented 1 year ago

As a short-term stopgap, I've confirmed that we can log the error and the rest of the application is still functional. That's what I would do to get it out the door.

Long term, I think instead of logging our errors, we could accept that the string is not json and wrap it in something like description := fmt.Sprintf("{'description': %s }") and have the application behave as normal.

What are your thoughts on that @inkel?

inkel commented 1 year ago

Thanks for taking the time to investigate this! I'll see to add logging to the GCP provider (actually maybe all of them) and get things moving so the app continues working by logging the error instead of failing.

As for the long term fix I don't know how much of a benefit that would bring us, because for that particular provider we are indeed expecting a JSON string with certain fields, so if those aren't there, then there's no point in crafting a valid JSON payload to parse, as we would still lack the information needed. Nevertheless, it is something to consider, I'll think about it.

Again, thank you so much! ❤️