im2nguyen / rover

Interactive Terraform visualization. State and configuration explorer.
MIT License
3.03k stars 178 forks source link

runtime error: invalid memory address or nil pointer dereference (Mislabeled Resource Type) #90

Closed jlestrada closed 2 years ago

jlestrada commented 2 years ago

Posting this for now as I continue to debug, but maybe others are seeing similar issues.

Overview: Runtime error when attempting to generate graph from provided plan that points to an invalid memory address reference. Upon further debugging, it appears to be related to a Data source type being labeled as a Resource source type. At the moment I am unsure of how to reproduce.

Release Version: 3.0

ERROR Message:

> rover -planPath plan.out
2022/01/31 09:48:08 Starting Rover...
2022/01/31 09:48:08 Using provided plan...
2022/01/31 09:48:12 Generating resource overview...
2022/01/31 09:48:12 Generating resource map...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x174c28e]

goroutine 1 [running]:
main.(*rover).GenerateModuleMap(0xc000187680, 0xc00009fdd0, 0xc00036ac40, 0x38)
    /Users/joseestrada/rover/map.go:225 +0x1dae
main.(*rover).GenerateModuleMap(0xc000187680, 0xc00009e5a0, 0xc0003a8cf0, 0x25)
    /Users/joseestrada/rover/map.go:253 +0x2357
main.(*rover).GenerateModuleMap(0xc000187680, 0xc00050e750, 0xc000399170, 0x12)
    /Users/joseestrada/rover/map.go:253 +0x2357
main.(*rover).GenerateModuleMap(0xc000187680, 0xc000535b88, 0x0, 0x0)
    /Users/joseestrada/rover/map.go:253 +0x2357
main.(*rover).GenerateMap(0xc000187680, 0x0, 0x0)
    /Users/joseestrada/rover/map.go:335 +0x1ee
main.(*rover).generateAssets(0xc000187680, 0xc000187680, 0x2)
    /Users/joseestrada/rover/main.go:193 +0x130
main.main()
    /Users/joseestrada/rover/main.go:138 +0xea5

Additional Context: From what I gather and understand, it appears that my generated Terraform Plan is causing a runtime error due to a resource being mislabeled. I believe that a Data source is being labeled as a Resource type and thus causing the error.

This error is happening during the generating of the resource map where it appears to be setting the file Name of the resource as well as its line number (code link). To help deep dive into the issue a bit more I modified the code locally which is described by the following code block.

note: forgive my naming of things in advance. I am somewhat confused still about the difference between states and configs.

            if configured {
                var fname string
                ind := fmt.Sprintf("%s.%s", re.ResourceType, re.Name)

                log.Printf("Resource State Type: %s",rs.Type)

                if rs.Type == ResourceTypeData {

                    log.Println("Resource State Type is Data")

                    ind = fmt.Sprintf("data.%s", ind)
                    fname = filepath.Base(configs[parentConfig].Module.DataResources[ind].Pos.Filename)
                    re.Line = &configs[parentConfig].Module.DataResources[ind].Pos.Line
                } else if rs.Type == ResourceTypeResource {

                    log.Printf("Resource Name: %s",re.Name)
                    log.Printf("Resource Type: %s",re.ResourceType)
                    for key := range configs[parentConfig].Module.ManagedResources {
                        log.Printf("Parent Module Configs Managed Resource Key: %s",key)
                    }

                    fname = filepath.Base(configs[parentConfig].Module.ManagedResources[ind].Pos.Filename)
                    re.Line = &configs[parentConfig].Module.ManagedResources[ind].Pos.Line
                }

                r.AddFileIfNotExists(parent, parentModule, fname)

                parent.Children[fname].Children[id] = re

            }

The following output here shows a bit more whats going on.

2022/01/31 09:48:12 Parent Module State Child Resource: module.atlantis_v2.module.ecs_service.module.alb_routing.data.aws_route53_zone.private
2022/01/31 09:48:12 Resource State Type: data
2022/01/31 09:48:12 Resource State Type is Data
2022/01/31 09:48:12 Parent Module State Child Resource: module.atlantis_v2.module.ecs_service.module.alb_routing.data.aws_route53_zone.public
2022/01/31 09:48:12 Resource State Type: resource
2022/01/31 09:48:12 Resource Name: public
2022/01/31 09:48:12 Resource Type: aws_route53_zone
2022/01/31 09:48:12 Parent Module Configs Managed Resource Key: aws_lb_listener_rule.this
2022/01/31 09:48:12 Parent Module Configs Managed Resource Key: aws_route53_record.public_record_prevent_destroy
2022/01/31 09:48:12 Parent Module Configs Managed Resource Key: aws_route53_record.public_record
2022/01/31 09:48:12 Parent Module Configs Managed Resource Key: aws_route53_record.private_record_prevent_destroy
2022/01/31 09:48:12 Parent Module Configs Managed Resource Key: aws_route53_record.private_record
2022/01/31 09:48:12 Parent Module Configs Managed Resource Key: aws_lb_target_group.this_prevent_destroy
2022/01/31 09:48:12 Parent Module Configs Managed Resource Key: aws_lb_target_group.this
2022/01/31 09:48:12 Parent Module Configs Managed Resource Key: aws_lb_listener_rule.this_prevent_destroy

First thing to note here that is worth calling out, there is a 3 depth module call in play. I don't think that matters but maybe it does. It can be seen here at the resource address module.atlantis_v2.module.ecs_service.module.alb_routing.data.aws_route53_zone.public is being labeled as a Resource as opposed to Data. The log further down confirms that the Parent module does not contain a resource for hosted zone. The odd thing here is that the resource prior is very similar, but is treated correctly as a Data type. It has an almost identical path to the impacted resource address.

At the moment I cannot immediately tell how this type is being set because I assume that is where the problem is located or will at least tell us more information about it. FWIW this is a runtime error that is happening when allowing rover to execute the Terraform Plan as well.

I will keep digging into this but if others see anything similar or have some tips help is appreciated.

jlestrada commented 2 years ago

Hmmm I think i am getting closer but still unable to determine the issue. It appears to be related to the fact that the resource is an Array Data source. I don't believe the element within the Array is being processed in the above logs. I do see some references to replace the Brackets in the parent address and guessing one of these are causing the issue with the Child resource. I don't have enough good evidence to show at this point but I will try and circle back to this later to present my evidence more formally.

jlestrada commented 2 years ago

Oh wait I got it... but not exactly sure of the details or if I am breaking anything else. To clarify, I added the following line here

if configured && childIndex.MatchString(id)

If i understand correctly, this is a conditional statement that matches Resources and Data Types that are Index types. If that is true, then I dont know why all id that are no Index types still seem to process okay. At this point the graph did generate successfully. Maybe someone with more knowledge can look into what the proper solution could be as I am sure I am breaking many use cases.

jlestrada commented 2 years ago

For anyone following, I still have yet to find a proper fix. I dont think the one listed above makes the correct change. My lack of understanding on the application architecture i think is preventing me from providing a proper PR to correct the issue.

I have however been able to reproduce with the following code snippet.

resource "null_resource" "my_resource" {
  count = 3
}

data "null_data_source" "my_data_resource" {
  count = length(null_resource.my_resource)
}

This will create an array of Data resource types that will trigger the runtime error. The data resource can be commented out to see that the code base does work with a Resource type array.

I have tried a few different iterations of fixing this but nothing seems to be perfect. The graph generates successfully but the mapping is all off. Hopefully with the ability to reproduce now someone can help out.

JackFlukinger commented 2 years ago

Hey @jlestrada , I'll check this out a bit later. Thanks for the thorough documentation :) .

nonbeing commented 2 years ago

I'm also getting a stack trace but it looks quite different from the one posted here.

rover -planPath tfplan.out
2022/02/08 18:04:36 Starting Rover...
2022/02/08 18:04:36 Using provided plan...
2022/02/08 18:04:51 Generating resource overview...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x165e1fb]

goroutine 1 [running]:
main.(*rover).PopulateModuleState(0x16e2740, 0xc000333398, 0xc00020e640, 0x0)
    rover/rso.go:219 +0x19b
main.(*rover).PopulateModuleState(0x16e2680, 0xc000333398, 0xc00020e400, 0x0)
    rover/rso.go:267 +0x927
main.(*rover).GenerateResourceOverview(0xc000113320)
    rover/rso.go:319 +0x413
main.(*rover).generateAssets(0xc0000e9e38)
    rover/main.go:188 +0xc5
main.main()
    rover/main.go:138 +0xcbb

I'm on Rover version v0.3.0

The above crash is also happening when I run just rover without any commandline args.

jlestrada commented 2 years ago

@nonbeing the stacktrace appears to be failing in a different portion of the PopulateModuleState function. I agree that i believe this is a different issue. You can attempt to pull the latest source and build locally to test what changes fixed the above mentioned issue. The 0.3.0 release does not contain my proposed fixed for Data source array runtime error.

If you can define a way to reproduce i think that greatly helps to figure out the issue.

JackFlukinger commented 2 years ago

I'll check this out a bit later today. Thanks for reporting!

JackFlukinger commented 2 years ago

Hey @nonbeing , make sure you're on the most recent 3.0 version. I'm pretty sure this is something on your end -- the thrown line in your stacktrace (rso.go:219) is blank. Also @jlestrada , I believe @im2nguyen merged your PR and re-released rover with the modified block. I think this issue should probably be closed.

Yashfork commented 2 years ago

I encountered the same issue as mentioned by @nonbeing, I haven't installed rover rather I'm using "docker run --rm -it -p 9000:9000 -v $(pwd):/src im2nguyen/rover" command.

Status: Downloaded newer image for im2nguyen/rover:latest 2022/02/09 04:40:58 Starting Rover... 2022/02/09 04:40:58 Initializing Terraform... 2022/02/09 04:41:09 Generating plan... 2022/02/09 04:41:20 Generating resource overview... panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xa5cbfb]

goroutine 1 [running]: main.(rover).PopulateModuleState(0xae0a20, 0xc0004007e0, 0xc0001bee40, 0x0) /src/rso.go:219 +0x19b main.(rover).PopulateModuleState(0xae0a20, 0xc0004007e0, 0xc0001bed80, 0x0) /src/rso.go:267 +0x927 main.(rover).PopulateModuleState(0xae0a20, 0xc0004007e0, 0xc0001bed40, 0x0) /src/rso.go:267 +0x927 main.(rover).PopulateModuleState(0xae0960, 0xc0004007e0, 0xc0001bed00, 0x0) /src/rso.go:267 +0x927 main.(rover).GenerateResourceOverview(0xc0000fb200) /src/rso.go:319 +0x413 main.(rover).generateAssets(0x0) /src/main.go:188 +0xc5 main.main() /src/main.go:138 +0xcbb

I'm on terraform v1.1.2

JackFlukinger commented 2 years ago

@Yashfork What's the general structure of the terraform configuration you're applying?

Yashfork commented 2 years ago

@JackFlukinger To setup landingzone in GCP I'm trying to visualize 0-bootstrap through rover https://github.com/terraform-google-modules/terraform-example-foundation/tree/master/0-bootstrap

Yashfork commented 2 years ago

FYI: I'm able to create plan and can visualize it through Terraform visual but facing issues with Rover

JackFlukinger commented 2 years ago

@Yashfork I'll have a look in more detail tomorrow, thanks for reporting!

nonbeing commented 2 years ago

Hey @nonbeing , make sure you're on the most recent 3.0 version. I'm pretty sure this is something on your end -- the thrown line in your stacktrace (rso.go:219) is blank. Also @jlestrada , I believe @im2nguyen merged your PR and re-released rover with the modified block. I think this issue should probably be closed.

@JackFlukinger

I am indeed on the latest version of Rover:

$ rover --version
Rover v0.3.0

I am using Terraform v1.1.5 on macOS Monterey 12.1

The crash is 100% reproducible at my end. I tried running rover for all the 3 examples in the repo: simple-test, random-test and nested-test and rover has no issues running in those directories, I get the web page on http://localhost:9000 and everything.

Rover used to work just fine a while ago (perhaps 4 months ago) for my admittedly-complex AWS TF state. I just updated to v0.3.0 and it's crashing constantly now.

Just another data point, I tried running rover v0.2.1 on my TF state and it doesn't work right now (like it used to) but at least it doesn't crash/panic either:

2022/02/09 16:34:19 Starting Rover...
2022/02/09 16:34:19 Initializing Terraform...
2022/02/09 16:34:43 Generating plan...
2022/02/09 16:37:00 Unable to parse Plan: Unable to read Plan: unsupported state format version: expected ["0.1" "0.2"], got "1.0"
JackFlukinger commented 2 years ago

@nonbeing What is line rso.go:219 from your local installation? Also, rover 0.2.1 isn't compatible with Terraform 1.1.5 -- if you downgrade to Terraform 1.1.0 it should work.

nonbeing commented 2 years ago

I'm using the rover v0.3.0 binary for my platform. I downloaded the rover_0.3.0_darwin_amd64.zip package that contains the rover_v0.3.0 executable from the releases page: https://github.com/im2nguyen/rover/releases/download/v0.3.0/rover_0.3.0_darwin_amd64.zip ... So I'm not sure how to check rso.go:219 in my local installation since it's a binary?

I'm not a Golang programmer, just an end-user of Rover. However, I looked at the source code zip for the same v0.3.0 release and this is what I found in there:

 215   │         } else {
 216   │             if prior {
 217   │                 rs[id].Change.Before = rst.AttributeValues
 218   │             } else {
 219   │                 rs[id].Change.After = rst.AttributeValues
 220   │
 221   │             }

So I guess rso.go:219 is rs[id].Change.After = rst.AttributeValues ?

JackFlukinger commented 2 years ago

@nonbeing Oh this is very helpful, I'll have a fix later today. 🙂

casey-robertson-paypal commented 2 years ago

Experiencing the same issue with a fairly large AWS-based TF state. Hopefully similar issue:

2022/02/09 16:10:46 Starting Rover...
2022/02/09 16:10:46 Initializing Terraform...
2022/02/09 16:11:01 Generating plan...
2022/02/09 16:29:07 Generating resource overview...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x165e1fb]

goroutine 1 [running]:
main.(*rover).PopulateModuleState(0x16e2680, 0xc029172ed0, 0xc0002348c0, 0x0)
    rover/rso.go:219 +0x19b
main.(*rover).GenerateResourceOverview(0xc0000f7320)
    rover/rso.go:319 +0x413
main.(*rover).generateAssets(0x0)
    rover/main.go:188 +0xc5
main.main()
    rover/main.go:138 +0xcbb
JackFlukinger commented 2 years ago

Hey @casey-robertson-paypal , @nonbeing , @Yashfork , could one of you test the PR I just submitted? I haven't been able to replicate the issue locally but I'm fairly certain that should resolve it.

Sorry about the confusion earlier, my local repo was weirdly out-of-sync. If you could follow the build from source section with my PR and then run rover on your plan, I would really appreciate it :)

nonbeing commented 2 years ago

@JackFlukinger Thanks for looking into this and the speedy PR! I don't have a Golang dev-setup so I can't build from source. If you could share a binary for "darwin_amd64" somewhere, I would be happy to test it against my TF state.

JackFlukinger commented 2 years ago

Hey @nonbeing , here you go https://file.io/LvPgsOBGGqi9

nonbeing commented 2 years ago

@JackFlukinger It's working now! 🎉🎊 I'm even able to see the visualizations in the web app on localhost:9000 Thank you and well done! 👍

nonbeing commented 2 years ago
image

P.S. this is the visualization for our rather complex AWS setup (if I may say so myself) ... I guess this is a mini stress test of sorts for Rover perhaps, and it's passing this with flying colors! 😄

JackFlukinger commented 2 years ago

@nonbeing Out of curiosity, what are the resources outside of the purple background? That may not be intended behavior. Could you send a higher-resolution image? Or an SVG using -genImage true

nonbeing commented 2 years ago

Sure, here's the SVG: https://file.io/YJrw7hJXmSEp

JackFlukinger commented 2 years ago

Okay @nonbeing , definitely not intended. Where is the data.aws_iam_policy_document.webhook_sns_topic_policy supposed to be housed? Should be a simple fix.

Yashfork commented 2 years ago

@JackFlukinger It is working now!

I tried creating a docker image manually and it is working fine. Thanks for the quick fix :)

nonbeing commented 2 years ago

@JackFlukinger I've created a new bug to discuss the issue with the visualization since (I guess) it's unrelated to the panic/crash that you fixed here.

Can we take the conversation around data.aws_iam_policy_document.webhook_sns_topic_policy over to #93?