Closed umairidris closed 4 years ago
@jaketf Could you review? I don't have experience with CDF.
I think this module would be more useful to customers if it set up:
Agreed that we should try to do more with root modules. I'd suggest we do this by providing submodules for the individual components while the root module composes the submodules together.
This way, users with unique/different needs can directly use a submodule.
I think that this is a good start (see note on service endpoint). However, this still asks a lot of the user to get a private instance up and running. I think this module would be more useful to customers if it set up:
- took a VPC self link as input
- created ip allocation for peering
- created the peering
- Created SSH ingress FW rule source: P4 SA target dataproc subnet.
- A place to add SA permissions to add to the data fusion P4 SA
Agreed. I filed an issue in this repo to add these features as submodules as Morgante suggested which I will do in a follow up (but not likely for another few weeks as I don't have much bandwidth).
Can you clarify what you mean by "took a VPC self link as input"? Are you suggesting that in place of the network_config?
Another concern is how much changes when CDF supports calling dataproc jobs API instead of SSH? Is the networking stack the same?
@morgante: What I am thinking at the moment is to have the root module create the data fusion instance and then implement the other features jake suggested as submodules that can be optionally called from the root module. Does that make sense?
@morgante: What I am thinking at the moment is to have the root module create the data fusion instance and then implement the other features jake suggested as submodules that can be optionally called from the root module. Does that make sense?
I think it might make sense to also put the data fusion instance in a submodule and then the root module could (optionally) simply provide an end-to-end composition of the different submodules. Does that make sense?
@morgante: What I am thinking at the moment is to have the root module create the data fusion instance and then implement the other features jake suggested as submodules that can be optionally called from the root module. Does that make sense?
I think it might make sense to also put the data fusion instance in a submodule and then the root module could (optionally) simply provide an end-to-end composition of the different submodules. Does that make sense?
Makes sense, that way a user can just use the CDF instance submodule as well if they don't want any extras. I'll make that change here (the root module will just be a dummy forwarding one till the other submodules are in place).
Agreed. I filed an issue in this repo to add these features as submodules as Morgante suggested which I will do in a follow up (but not likely for another few weeks as I don't have much bandwidth).
I can take on some of this work to lighten your load.
Can you clarify what you mean by "took a VPC self link as input"? Are you suggesting that in place of the network_config?
I was thinking we'd need it to create the ip allocation but we can just read the network config var.
Another concern is how much changes when CDF supports calling dataproc jobs API instead of SSH? Is the networking stack the same?
I asked this question in CDF training last week. They said the peering / ip allocation will still be necessary even after they Dataproc Jobs API, not sure exactly why, will follow up.
Can this be merged? Seems there are no pending review items. I'd like to use this as a base start work on #8
Yup, I was just waiting on your sign-off.
@morgante so speedy! thank you!
Fix #6 Fix #4
cc @jaketf