sjourdan commented 3 years ago


Created	2021/8/23
Current Version	0.0.2
Target Version	1.0.0
Owner	@sjourdan
Contributors

Summary

As a driftctl CLI user I want to export my driftctl result So that I can use it somewhere else

The idea with a dedicated driftctl export command is to work independently from the scan command (and its middlewares) as well as the related, experimental --deep mode.

Abandoned Ideas

An undocumented proof-of-concept of the feature currently exists using driftctl scan (driftctl <= 0.15):

$ driftctl scan --output plan://tfplan.json
Scanned states (1)
Scan duration: 19s

This PoC works but we're not satisfied enough to support this long-term: it goes through all processing middlewares targeted at drift detection and requires a working deep mode.

Export Formats

Here are two examples of export formats that could be used with this feature.

Export Format: Terraform Plan (JSON)

As a driftctl CLI user I want to export my result to the Terraform JSON plan format So that I can send it to another product for analysis

Proposal

Problem: default driftctl mode doesn't include resource details (experimental --deep mode does)
Hypothesis: there's no need for any processing middleware there
Proposal: simply populate resource content from the available Terraform provider version
Reference: https://www.terraform.io/docs/internals/json-format.html

$ driftctl export --to tfplan://plan.json 
[...]
{
planned_values { } 
resource_changes [ ]
}

Format Structure

More detailed specs will follow, but the bottom line for the 2 main sections is:

planned_values { } should contain all the expected resources we know about (like a regular plan would generate), so existing, expected resources, can be analyzed, if needed. It should also include (abusively!) the discovered unmanaged resources (while yes, those resources won't be detectable there by anyone)
resource_changes [ ] should contain the unmanaged resources, to be analyzed by the destination tool (with the create action, while others keep the "no-op" action)

Export Format: Anonymized Output (Console)

Context: To support users, we often ask for debug logs (LOG_LEVEL=debug driftctl scan [...])
Problem: While debug output doesn't explicitly require any personally identifiable information, users may have resource names that can identify them, their employer, their customers.
Reference: https://github.com/cloudskiff/driftctl/issues/1006
Impacted Users: "impermanence" on #support discord 2021/9/1

Example output:

$ driftctl export --anonymize console://
Found missing resources:
  aws_s3_bucket:
    - 44E7A9345E64EDB2680D12FCC85EB22B5C4A04FD73D4743718AAAF2E71649F5F
Found resources not covered by IaC:
  aws_s3_bucket:
    - 3D12E914A8DCC696A13881C4DB9A13A0296A421AFC80264B83093B6A9B506071
Found changed resources:
  - BE6464EBE29D35168133C06DB3E7CDAA791EE10845EBD660C6C507B6894CF644 (aws_s3_bucket):
    ~ Versioning.0.Enabled: false => true
Found 3 resource(s)
 - 33% coverage
 - 1 covered by IaC
 - 1 not covered by IaC
 - 1 missing on cloud provider
 - 1/1 changed outside of IaC

Option 1: anonymize only resources names

In the following case, we would hash "customername1".

resource "aws_instance" "customername1" {
  ami           = "ami-a1b2c3d4"
  instance_type = "t2.micro"
}

That would share this:

resource "aws_instance" "9DFA96B41AC5775A790F267222369BFBACF4271DE245EB34BCC71C6AB5856BD1" {
  ami           = "ami-a1b2c3d4"
  instance_type = "t2.micro"
}

Option 2: anonymize everything

Goal: so there's no wonder what can leak.

Such a resource with identifiable strings will be fully anonymized with such a system:

resource "aws_route53_record" "customername2" {
  zone_id = aws_route53_zone.primary.zone_id
  name    = "www.example.com"
  type    = "A"
  ttl     = "300"
  records = [aws_eip.lb.public_ip]
}

So an output using those would end up shared like this:

resource "aws_route53_record" "414A900AFF49FB55587BEF31E16646413AB7B9951DE9124100A08303ACD973A1" {
  zone_id = 09B0FC1DC6948813379EBEDECC16C99AD4D4C7FDF8A55E5268E0A607A9499369
  name    = 6EF1E4DB8AFCF82F85F694E5D7BE53961969092E751E52A081719EAD18438E0B
  type    = 798640599597DF7A8DAA32B1132F07850A68B5E71BD295650399A38074F52804
  ttl     = B9A9F15714A2A7AF00637C2030926CE17919DF2D9DC3B3531D582E7BBAD7315C
  records = [6C8A8BD8FC176A6D988204418DDE0289C74D5BBC812DFDC0E0900ABBF038B303]
}

sundowndev commented 3 years ago

It's unclear to me what does the export command would perform under the hoods. Does it need to perform an actual scan ?

default driftctl mode doesn't include resource details (experimental --deep mode does)

We could make deep mode being a requirement to use the export feature.

there's no need for any processing middleware there

It's still possible to create the export before running any middleware.

sjourdan commented 3 years ago

It's unclear to me what does the export command would perform under the hoods. Does it need to perform an actual scan ?

Yes, driftctl export should scan the resources like the real scan, but also add resources content from the tf provider without any processing (we don't care about it there). Like a "raw" deep mode by default if you prefer.

We could make deep mode being a requirement to use the export feature.

True, driftctl export can't work without either the real deep mode or an equivalent!

It's still possible to create the export before running any middleware.

interesting, do you mean a prototype version of the driftctl export command can be done to reuse the current tfplan PoC and make it less complicated to maintain in the long run?

sundowndev commented 3 years ago

interesting, do you mean a prototype version of the driftctl export command can be done to reuse the current tfplan PoC and make it less complicated to maintain in the long run?

I think having to create a new command to make another kind of scan will be hard to maintain. May be we could embed the export feature in a simple --export flag for the existing scan command ? In this way, we don't rewrite the scan command and still be able to avoid useless processing. Unlike the abandoned PoC was suggesting, the export format shouldn't be considered as a scan output

wbeuil commented 3 years ago

The scan in its root feature (read state + scan remote + first part of the analysis) is indeed what we need to do here.

BUT what we meant by "processing" is all the attributes that we decided voluntarily to remove for the sake of driftctl (e.g. look at initAwsCloudfrontDistributionMetaData) and all the middlewares that could potentially remove other essential attributes.

I don't think we could do that in just one command, thus creating another command for it.

sundowndev commented 3 years ago

all the attributes that we decided voluntarily to remove for the sake of driftctl

@wbeuil You're right I didn't think about metadata we remove. I better understand the choice of abandoning the previous ideas.

eliecharra commented 3 years ago

Wow, it's quite a big one RFC for sure.

First thing, resource anonymization is another topic, and for me, it should remain in a dedicated RFC. We're gonna to end with a big mess in our brain if we try to solve multiples problems in a single RFC.

While reading I don't get what we really want here. We talk everywhere about "result"

I want to export my result to the Terraform JSON plan format

Maybe the first thing can be to explain what we want exactly to export ?

there's no need for any processing middleware there

Why ? One more time, what do we expect as a result ? If this is actual scan result, it does not make sense not to run middleware on a scan result. Middleware are here to ensure consistency between IaC and real life resources, if you simply not run them you'll end with an inconsistent output 🤔

Yes, driftctl export should scan the resources like the real scan, but also add resources content from the tf provider without any processing (we don't care about it there). Like a "raw" deep mode by default if you prefer.

but also add resources content from the tf provider I don't get it, what is a tf provider for you ?

TL;DR I'm stopping the review here since I don't get what we really want here. I think we should start from a clear explanation of what we want, with real life usecase.

So that I can use it somewhere else

"somewhere else" sounds very unclear. This should be clarified.

Implementation details will came after a deep reflection about what we want, and then we'll talk about UX/UI and implementation details.

moadibfr commented 3 years ago

I have to agree with Elie on the export part. I'll have to add that I'm not sure the anonymization part is the same thing as the export part. For debug purpose we need a full scan run and most of the think we want to anonymize are in the logs. I'm pretty sure these are modification of scan and not a new command.

snyk / driftctl

RFC: new export feature in driftctl #933

Summary

Abandoned Ideas

Export Formats

Export Format: Terraform Plan (JSON)

Proposal

Format Structure

Export Format: Anonymized Output (Console)

Option 1: anonymize only resources names

Option 2: anonymize everything