terraform-google-modules / terraform-google-data-fusion

Manages Cloud Data Fusion
https://registry.terraform.io/modules/terraform-google-modules/data-fusion/google
Apache License 2.0
14 stars 30 forks source link

Add root CDF instance module #7

Closed umairidris closed 4 years ago

umairidris commented 4 years ago

Fix #6 Fix #4

cc @jaketf

morgante commented 4 years ago

@jaketf Could you review? I don't have experience with CDF.

morgante commented 4 years ago

I think this module would be more useful to customers if it set up:

Agreed that we should try to do more with root modules. I'd suggest we do this by providing submodules for the individual components while the root module composes the submodules together.

This way, users with unique/different needs can directly use a submodule.

umairidris commented 4 years ago

I think that this is a good start (see note on service endpoint). However, this still asks a lot of the user to get a private instance up and running. I think this module would be more useful to customers if it set up:

  • took a VPC self link as input
  • created ip allocation for peering
  • created the peering
  • Created SSH ingress FW rule source: P4 SA target dataproc subnet.
  • A place to add SA permissions to add to the data fusion P4 SA

Agreed. I filed an issue in this repo to add these features as submodules as Morgante suggested which I will do in a follow up (but not likely for another few weeks as I don't have much bandwidth).

Can you clarify what you mean by "took a VPC self link as input"? Are you suggesting that in place of the network_config?

Another concern is how much changes when CDF supports calling dataproc jobs API instead of SSH? Is the networking stack the same?

@morgante: What I am thinking at the moment is to have the root module create the data fusion instance and then implement the other features jake suggested as submodules that can be optionally called from the root module. Does that make sense?

morgante commented 4 years ago

@morgante: What I am thinking at the moment is to have the root module create the data fusion instance and then implement the other features jake suggested as submodules that can be optionally called from the root module. Does that make sense?

I think it might make sense to also put the data fusion instance in a submodule and then the root module could (optionally) simply provide an end-to-end composition of the different submodules. Does that make sense?

umairidris commented 4 years ago

@morgante: What I am thinking at the moment is to have the root module create the data fusion instance and then implement the other features jake suggested as submodules that can be optionally called from the root module. Does that make sense?

I think it might make sense to also put the data fusion instance in a submodule and then the root module could (optionally) simply provide an end-to-end composition of the different submodules. Does that make sense?

Makes sense, that way a user can just use the CDF instance submodule as well if they don't want any extras. I'll make that change here (the root module will just be a dummy forwarding one till the other submodules are in place).

jaketf commented 4 years ago

Agreed. I filed an issue in this repo to add these features as submodules as Morgante suggested which I will do in a follow up (but not likely for another few weeks as I don't have much bandwidth).

I can take on some of this work to lighten your load.

Can you clarify what you mean by "took a VPC self link as input"? Are you suggesting that in place of the network_config?

I was thinking we'd need it to create the ip allocation but we can just read the network config var.

Another concern is how much changes when CDF supports calling dataproc jobs API instead of SSH? Is the networking stack the same?

I asked this question in CDF training last week. They said the peering / ip allocation will still be necessary even after they Dataproc Jobs API, not sure exactly why, will follow up.

jaketf commented 4 years ago

Can this be merged? Seems there are no pending review items. I'd like to use this as a base start work on #8

morgante commented 4 years ago

Yup, I was just waiting on your sign-off.

jaketf commented 4 years ago

@morgante so speedy! thank you!