Open maastha opened 1 year ago
I think we might have a same issue here. Did you try to assign the block value to the state in Read
function?
We found that Framework doesn't support optional + computed block, thus the block can't be placed into state in Create
function of the resource, when it's not configured by the user.
If you put the block value into state in Read
function, during the refresh, Terraform will consider it as a managed infra change (thinking about you made a change to a Terraform managed infra outside of Terraform operations, e.g. direct API call from cURL), instead of a block is being computed and then give you a new plan to change the infra state back to what's stored in Terraform.
We couldn't find a workaround or a solution here neither, and I hope we can get Terraform team to take a look at this.
Hi @maastha 👋 Thank you for submitting this and sorry you are running into trouble here. Terraform in general is very particular about data consistency and proper value planning once a resource is migrated to the framework, so it may involve teaching the provider code much more about the API behaviors associated with each attribute. Let's see if we cannot troubleshoot this a little further.
For starters, can you run Terraform with warning logs enabled using the SDK version of the resource first? e.g. TF_LOG=WARN terraform apply
. What we would be looking for here is whether Terraform thought the SDK-based resource wasn't already following Terraform's data consistency rules. While the migration documentation around checking this type of issue talks about a simpler root-level string attribute, we can hopefully use that information to guide us further in this situation.
Next, we could use some additional information from you about the API responses. Are the left-hand values noted in the "no configuration blocks" plans always returned from the API with these values?
- advanced_configuration {
- javascript_enabled = true -> null
- minimum_enabled_tls_protocol = "TLS1_2" -> null
- no_table_scan = false -> null
- oplog_min_retention_hours = 0 -> null
}
- bi_connector_config {
- enabled = false -> null
- read_preference = "secondary" -> null
}
If so, this likely means we need to take a look at resource default attribute value implementations for each. That might help with the "all blocks configured" case. There is the additional caveat here that defaults may need to be added at the list or object level as well, to cover the "no configuration block" case, as defaults of nested object attributes are only consulted if their wrapping object exists already.
Otherwise, as @zliang-akamai alludes to above, the answer may actually be to omit saving certain information to the state when it is not present in the configuration. It is likely that Terraform would have given practitioners errors if they did happen to directly reference those block attributes elsewhere in their configuration since the value would have changed between plan and apply. (In fact, I'm a little surprised that Terraform isn't already raising an error, as I would have expected a block without configuration to error if it was saved with data.)
Finally, it is worth mentioning that when migrating from the SDK to the framework, Terraform may hide very particular plan differences (e.g. empty string to null) because not doing so would cause very noisy plans for practitioners with the older SDK. If you ever see a planned resource change with all the attributes "hidden", then https://github.com/hashicorp/terraform/issues/31887 is an upstream bug report relating to this. Saving the plan file and inspecting it in JSON format or implementing a special plan check in terraform-plugin-testing that automatically compares all resource change before/after values for debugging is handy in those cases. I thought we had an testing feature request to consider implementing this debugging plan check somewhere but I'm having trouble finding it at the moment -- but can recreate it if needed.
To add a little bit more context about this confusing migration space, I'll also mention the blocks with computed fields migration documentation. What I care to talk about there is that the terraform-plugin-sdk did some gymnastics in certain cases to make the single schema.Schema
type "work" for both attributes and blocks, in particular that
computed-only blocks were automatically converted to object attributes. In Terraform and HCL, blocks are inherently much different than attributes, although the provider development SDKs try to paper over that detail. There actually is/was no such thing as a "computed block" in Terraform. The framework directly exposes this reality, but understandably it can be difficult to unravel the implications of that implementation detail when migrating from the SDK. Hopefully we can get to the bottom of this for you though and get a workable path forward.
Hi @bflad, thanks for the detailed answer! I am still hoping to migrate Computed and Optional TypeList
in SDKv2 to Framework. What do you think about migrating it to be ListAttribute
with Optional
and Computed
setting to true
in the framework?
Something like:
map[string]schema.Attribute{
"example_nested_block": schema.ListAttribute{
ElementType: types.ObjectType{
AttrTypes: map[string]attr.Type{
"example_block_attribute": types.StringType,
},
},
Computed: true,
Optional: true
}
}
But the problem is we can't set a full schema to items of AttrTypes
, e.g., we can't specify example_block_attribute
as a required attribute or assign it a default value if the object (practically a block in the config file) is configured by the user.
Since SDKv2 already did some gymnastics, I think many providers, including ours, will encounter issue similar to use case 1 in this issue if they are migrating to framework. It will be very nice if there are some ways Terraform can assist providers to make a such breaking change, if direct support of computed and optional block is not possible. Maybe something like state upgrade but for user config? I don't know..
Reference:
Thanks so much, @zliang-akamai. Can you confirm whether or not Terraform is raising warning logs in your case for the SDK-based resource and whether it was also using the quirky ConfigMode: schema.SchemaConfigModeAttr
like the original issue description? Presumably those logs should be raised in the case of the block not being configured, but the resource setting its state. If that is not happening, then maybe we can figure out something.
I would think the list of objects solution would only work for Computed: true
without Optional: true
blocks, since that case doesn't need to carry any additional schema information (optional, required, validation, etc.). In the case a practitioner does want to configure it, a list of objects attribute would typically require switching the Terraform configuration from block syntax example_nested_block { ...
to attribute syntax example_nested_block = {
. There is some special handling in Terraform core to cover attributes as blocks behavior which does muddy that typical configuration requirement though.
Thanks a lot for the response @bflad
defaults may need to be added at the list or object level as well
I believe we are supposed to migrate to Blocks (to avoid change in "= []" syntax), is it possible to set defaults at object level for ListNestedBlock? If so, can you share an example? Also, assuming this works with the drift detection, could this still throw provider inconsistency error on apply?
As requested here are WARN logs if no advanced_configuration
or bi_connector_config
block is configured. I see the inconsistency rules called out here, is this what you were referring to?
[WARN] Provider "registry.terraform.io/mongodb/mongodbatlas" produced an invalid plan for mongodbatlas_advanced_cluster.replicaset-cluster-no-adv-config, but we are tolerating it because it is using the legacy plugin SDK.
The following problems may be the cause of any confusing errors from downstream operations:
- .bi_connector_config: attribute representing nested block must not be unknown itself; set nested attribute values to unknown instead
WARN logs if `advanced_configuration` block is configured partially or even an empty block({}):
[WARN] Provider "registry.terraform.io/mongodb/mongodbatlas" produced an invalid plan for mongodbatlas_advanced_cluster.sharded-cluster-empty-adv-config, but we are tolerating it because it is using the legacy plugin SDK. The following problems may be the cause of any confusing errors from downstream operations:
.advanced_configuration: planned value cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"default_read_concern":cty.UnknownVal(cty.String), "default_write_concern":cty.UnknownVal(cty.String), "fail_index_key_too_long":cty.UnknownVal(cty.Bool), "javascript_enabled":cty.UnknownVal(cty.Bool), "minimum_enabled_tls_protocol":cty.UnknownVal(cty.String), "no_table_scan":cty.UnknownVal(cty.Bool), "oplog_min_retention_hours":cty.NullVal(cty.Number), "oplog_size_mb":cty.UnknownVal(cty.Number), "sample_refresh_interval_bi_connector":cty.UnknownVal(cty.Number), "sample_size_bi_connector":cty.UnknownVal(cty.Number), "transaction_lifetime_limit_seconds":cty.UnknownVal(cty.Number)})}) does not match config value cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"default_read_concern":cty.NullVal(cty.String), "default_write_concern":cty.NullVal(cty.String), "fail_index_key_too_long":cty.NullVal(cty.Bool), "javascript_enabled":cty.NullVal(cty.Bool), "minimum_enabled_tls_protocol":cty.NullVal(cty.String), "no_table_scan":cty.NullVal(cty.Bool), "oplog_min_retention_hours":cty.NullVal(cty.Number), "oplog_size_mb":cty.NullVal(cty.Number), "sample_refresh_interval_bi_connector":cty.NullVal(cty.Number), "sample_size_bi_connector":cty.NullVal(cty.Number), "transaction_lifetime_limit_seconds":cty.NullVal(cty.Number)})})
.bi_connector_config: attribute representing nested block must not be unknown itself; set nested attribute values to unknown instead
@bflad
I also tried out filtering out attributes not in the request.State
. In this case, I saw below on terraform plan
, there's no attribute change detected but not sure why this diff is still detected, could you help understanding this?:
~ replication_specs {
id = "657889449387ff62a76e2cb5"
# (3 unchanged attributes hidden)
~ region_configs {
# (3 unchanged attributes hidden)
# (2 unchanged blocks hidden)
}
~ region_configs {
# (3 unchanged attributes hidden)
~ analytics_specs {
+ disk_iops = (known after apply)
# (2 unchanged attributes hidden)
}
~ electable_specs {
+ disk_iops = (known after apply)
# (2 unchanged attributes hidden)
}
}
}
~ replication_specs {
id = "657889449387ff62a76e2cb6"
# (3 unchanged attributes hidden)
~ region_configs {
# (3 unchanged attributes hidden)
# (2 unchanged blocks hidden)
}
}
# (2 unchanged blocks hidden)
}
Also, note that disk_iops
in above does have a planmodifier here added but it still shows up with (known after apply)
, how should this be handled?? disk_iops is Optional, but existing state has "disk_iops": 0
but disk_iops is not configured. I am filtering out everything not in the request.State in the Read() handler
Reference:
main.tf snippet for second block:
replication_specs { # zone n2
zone_name = "zone n2"
num_shards = 2 # 2-shard Multi-Cloud Cluster
region_configs { # shard n1
electable_specs {
instance_size = "M10"
node_count = 3
}
analytics_specs {
instance_size = "M10"
node_count = 1
}
provider_name = "AWS"
priority = 7
region_name = "EU_WEST_1"
}
}
{
"container_id": {
"AWS:EU_WEST_1": "657889449387ff62a76e2cc7"
},
"id": "657889449387ff62a76e2cb6",
"num_shards": 2,
"region_configs": [
{
"analytics_auto_scaling": [],
"analytics_specs": [
{
"disk_iops": 3000,
"ebs_volume_type": "",
"instance_size": "M10",
"node_count": 1
}
],
"auto_scaling": [],
"backing_provider_name": "",
"electable_specs": [
{
"disk_iops": 3000,
"ebs_volume_type": "",
"instance_size": "M10",
"node_count": 3
}
],
"priority": 7,
"provider_name": "AWS",
"read_only_specs": [],
"region_name": "EU_WEST_1"
}
],
"zone_name": "zone n2"
}
Thanks so much, @maastha. I really appreciate you working through this with me. I apologize that I also neglected to mention a newer SDK feature which should promote those warning logs to errors in Terraform to make them a little bit more easier to see during acceptance testing:
schema.Resource{
// ... other fields as necessary ...
EnableApplyLegacyTypeSystemErrors: true,
EnablePlanLegacyTypeSystemErrors: true,
}
Just please be careful to not release the SDK based resource with those enabled since there are known errors. We had added that to the top of the overall framework migration steps, but I'm not sure its linked deeper in the migration documentation, so I'll take a look at seeing where it might make sense elsewhere if its not already.
I believe we are supposed to migrate to Blocks (to avoid change in "= []" syntax), is it possible to set defaults at object level for ListNestedBlock? If so, can you share an example? Also, assuming this works with the drift detection, could this still throw provider inconsistency error on apply?
What I meant to allude to was that defaults may be applicable for the planned values to match the API behavior. This should work as expected for underlying attributes, but the framework intentionally prevents developers from implementing the "native" default value handling for blocks (e.g. the literal Default
field present in attributes). From Terraform's perspective, there should be no such thing as a default for a block; the block either exists in configuration or does not. The prior SDK incorrectly tried to work around that. Technically though, the framework does expose block/object plan modifiers (mainly there for triggering resource replacement) and resource-level plan modification. I've personally never tried and I haven't seen a working example of trying to change plans for blocks themselves. If I had to venture an upfront guess though, I'm guessing that trying to modify the plan to include missing blocks (the list or object) would likely raise errors.
Let's take a step back though with the below new knowledge --
I see the inconsistency rules called out here, is this what you were referring to?
Indeed, which is a good and bad thing. It is good that we are learning that the prior resource implementation was not meeting Terraform's data consistency rules, but of course bad because these need to be resolved in some way as part of this migration as this is an unavoidable problem.
Given that the existing resource is violating those rules, I anticipate the deeper we look into this, the more likely that one of the following will need to happen:
There are quite a few things you are asking about, so maybe let's try to start at a high level for the intent of this configuration/data in the resource. Particularly since you mention each of them, let's talk about advanced_configuration
, bi_connector_config
, and replication_specs
. I'm asking these questions because there is a disconnect from what was possible to implement in the prior SDK and Terraform's rules/capabilities. It's very easy to get lost in the sea of implementation details across the SDK/framework, the API, and Terraform itself.
advanced_configuration
and bi_connector_config
a single API object and replication_specs
an unordered and unique set of API objects?I'm going to try to summarize my understanding so far, but please correct me where necessary:
advanced_configuration
is an optional single API object which practitioners can optionally set certain details. Each underlying configuration value may have static (not dependent on other configuration details) defaults provided by the API if not given. Practitioners likely would never reference these values elsewhere in their configuration and I presume that if they were trying to do that today, they would see the "confusing errors from downstream operations" mentioned in the warning logs.bi_connector_config
is an optional single API object which practitioners can optionally set certain details. Each underlying configuration value may have static (not dependent on other configuration details) defaults provided by the API if not given. Practitioners likely would never reference these values elsewhere in their configuration and I presume that if they were trying to do that today, they would see the "confusing errors from downstream operations" mentioned in the warning logs.replication_specs
(using some details from #884 and above) is an optional set of API objects which practitioners must set certain details and can optionally set others. The API will set a computed identifier for each object and only some underlying configuration values may have defaults provided by the API if not given. I'm not sure if practitioners would ever reference these values elsewhere in their configuration.So in the prior SDK implementation, you had to use what was available:
Optional: true
was used.Computed: true
being also set. Since the data was always refreshed, this enabled the resource to perform a level of drift detection and suggest remediation for practitioner configured values.advanced_configuration
and bi_connector_config
: The SDK didn't expose single nested block or object attribute support, so the necessary and ecosystem conventional choice was to use list nested blocks that validated there was only one block. The SDK allowed this even though Computed: true
was set, since the SDK tried to paper over Terraform not supporting that implementation detail of blocks, by making it an attribute rather than a block. This was further codified by forcing ConfigMode: schema.SchemaConfigModeAttr
. Terraform itself also papered over configurations using block syntax over attribute syntax with the "special logic" mentioned in my prior comment.Now if this was a brand new resource implementation, I would recommend that all these be implemented by leveraging the protocol version 6 (Terraform 1.0+) nested attribute feature, which essentially gives the best solution everywhere since all the data can be optionally configured or computed by the API, and therefore always safely referenced in configurations. All the data would be refreshable as well for drift detection. In particular:
advanced_configuration
and bi_connector_config
being an optional+computed single nested attribute with a default object matching the API. The underlying attributes could also be optional+computed with individual default values matching the API to support partial object configuration.replication_specs
being an optional set nested attribute with a computed id
attribute and other (nested) attributes where applicable.However, that is quite different than what exists today from a practitioner standpoint. Practitioners would be burdened to upgrade to Terraform 1.0+ if they haven't already and put an equals sign anywhere a block was being configured since its now a (nested) attribute. As a goal of this migration appears to be to minimize practitioner burden the (more common 😄) path of trying to re-implement the existing schema was taken. Since I do not know all the intricacies of this API and its behaviors nor every permutation of how the prior SDK might save a zero-value (e.g. ""
for string attributes/0
for integers) into state instead of a null
value (a common source of pain for framework migrations to cause unexpected plan differences, and likely related to what is happening with replication_specs
) nor every permutation of how the prior SDK might inject unknown values during planning, we will likely need to continue discussing this at a high level with hand-wavy troubleshooting and potential ideas/solutions.
I'm having a hard time determining if you got further with advanced_configuration
and bi_connector_config
with your "filtering attributes" comment. If the goal was to prevent Terraform from showing differences for the unconfigured blocks, then not refreshing them based on whether or not they are present the prior state seems like it would be a decent solution.
So to potentially offer a tangible potential advanced_configuration
/bi_connector_config
solution, a setup that follows Terraform's rules I would anticipate is:
schema.ListNestedBlock
(technically you could try schema.SingleNestedBlock
, but that has "breaking" configuration considerations in certain situations, so trying to minimize differences)Optional
, Computed
, and/or Default
s to match API behaviors, to catch cases where the blocks are partially configured. If the API does not have a default, just use Optional
. If the API has an indeterminable default, Optional
and Computed
.Create
/Update
, use the plan values to send requests to API and only store the plan value in state (e.g. do not try to set the state with API values)Read
, if drift detection of the configured values is necessary, check each block's prior state for null, and if not null refreshing its underlying values from the API.This does mean that Terraform can no longer perform drift detection if the block is unconfigured, but that is a requirement for using blocks. If you don't want that restriction, the implementation will need to change in some way. We can discuss options down that path if you would like.
Figuring out what to do with existing state values if they were caused by the prior SDK saving the zero-value we can discuss once the general attribute/block handling is stable for new resources.
I think the replication_specs
issue(s) might be better suited in a separate thread (e.g. #884) since their causes are likely unrelated to the list block issue(s). Briefly, it might be related to a thing I mentioned in my first comment:
Finally, it is worth mentioning that when migrating from the SDK to the framework, Terraform may hide very particular plan differences (e.g. empty string to null) because not doing so would cause very noisy plans for practitioners with the older SDK. If you ever see a planned resource change with all the attributes "hidden", then https://github.com/hashicorp/terraform/issues/31887 is an upstream bug report relating to this. Saving the plan file and inspecting it in JSON format or implementing a special plan check in terraform-plugin-testing that automatically compares all resource change before/after values for debugging is handy in those cases. I thought we had an testing feature request to consider implementing this debugging plan check somewhere but I'm having trouble finding it at the moment -- but can recreate it if needed.
Thank you so much @bflad for such a detailed response on this, really appreciate this, and thanks for working with us on this, we plan to evaluate all your suggestions.
We have some follow-up questions:
In Create/Update, use the plan values to send requests to API and only store the plan value in state (e.g. do not try to set the state with API values)
In the above point, to make sure we understand correctly, the end result would effectively remove computed state that users may be referencing. Right? Therefore, as part of the solution we would have to provide users an alternative way to access these computed values. And as users might have to adjust their configuration for these referenced values it would be considered a breaking change. Is that correct understanding?
(known after apply)
for that attribute during plan and if anything changes, they wouldn't know what exactly changed in that computed attribute upon apply, is that correct?
If yes, is there a workaround for this? So that users can be made aware that some nested attribute in that computed-only attribute has been updated/or will be updated upon apply?Update: @bflad We tried approach to basically filter out un-configured values and not persist those in the state using suggestions from your last response:
- For each individual underlying attribute, setting Optional, Computed, and/or Defaults to match API behaviors, to catch cases where the blocks are partially configured. If the API does not have a default, just use Optional. If the API has an indeterminable default, Optional and Computed.
- In
Create/Update
, use the plan values to send requests to API and only store the plan value in state (e.g. do not try to set the state with API values)- In
Read
, if drift detection of the configured values is necessary, check each block's prior state for null, and if not null refreshing its underlying values from the API.
This creates a one-time drift detection when the user would upgrade to the new provider version with framework-migrated resource. Correct? Or could you suggest alternatives to prevent that? Reason for that is that unfortunately the Update API for this resource isn't exactly idempotent and can sometimes result in errors when attempting to update some of these lists, hence this does not seem to be working for us.
Another approach I considered is introducing additional computed ListNestedAttributes that store the complete API response while we keep existing block attributes to only store the planned/config data but my guess is this would also require a one-time update as it's essentially gets us to the same Update() API issue as in the last point.
Third alternative I can think of is to have users re-import the resource upon upgrading the provider. But even with this scenario, users CANNOT simply do a terraform state rm
and then re-import, because again during import the provider cannot figure out what the user might have in their Config, so I believe this will not work out either.
Please let us know if there are any other options we could look into.
We do think a StateUpgrader-like functionality or simply exposing the Config object in the resource.UpgradeStateRequest
would be very useful for similar migration scenarios so we possibly would have been able to avoid the one-time drift detected in # 1 and # 2 above, but unfortunately we can't access the user's Config
in the StateUpgrader resource.UpgradeStateRequest
to filter out any un-configured attributes from the state.
This creates a one-time drift detection when the user would upgrade to the new provider version with framework-migrated resource. Correct? Or could you suggest alternatives to prevent that? Reason for that is that unfortunately the Update API for this resource isn't exactly idempotent and can sometimes result in errors when attempting to update some of these lists, hence this does not seem to be working for us.
It's unfortunately difficult for me to access this without seeing what you're doing/seeing (e.g. prior state, config, and plan) and understanding the exact behaviors of the API. What are "some of these lists", do you mean the advanced_configuration
and bi_connector_config
list blocks? I'll admit I'm a little perplexed how the update API matters in this case as it seems your goal is to avoid a plan difference, which would avoid the resource update being called, and would therefore avoid any update API calls.
Beyond using the "human-readable" plan output, in the plan JSON output you should be able to compare the before and after data to determine what Terraform thinks is an actual difference. If those updates are differences such as empty strings (caused by the prior SDK) to null, then potentially adding an empty string default value on the attribute can preserve the prior SDK behavior and hide the difference for practitioners that did not configure a value. You'll just need to account for empty string handling in your resource logic in addition to null handling.
We do think a StateUpgrader-like functionality or simply exposing the Config object in the resource.UpgradeStateRequest would be very useful for similar migration scenarios so we possibly would have been able to avoid the one-time drift detected in # 1 and # 2 above, but unfortunately we can't access the user's Config in the StateUpgrader resource.UpgradeStateRequest to filter out any un-configured attributes from the state.
You can try submitting a feature request upstream in Terraform to consider exposing this across the protocol, since that is not something that could be implemented in the framework without it existing there. Please note though that I'm not sure how well that data fits in with the intended design for the RPC (mainly for upgrading existing state to match breaking schema changes) and even if it was implemented today, there would now be some expectation on your practitioners to be on the Terraform version (1.8+ as of this writing, since 1.7 is considered feature complete as its entering release candidate very shortly) that passes the additional configuration information across the protocol.
Module version
We are currently migrating existing Plugin-SDK based resources to Plugin Framework (Plugin protocol v6). In below resource, we have some optional+computed list block attributes where certain nested attributes if not configured/partially configured by the user in the config, the API/provider will still return those attributes. That response is currently persisted in the state.
In order to migrate these blocks to the Plugin Framework, we have tried using schema.ListNestedBlock but the returned plan after upgrade to the Framework-migrated resource is always non-empty. This will be a breaking change for our users.
What is the recommended non-breaking way to migrate list attributes such as these?
Relevant provider source code
Terraform Plugin SDK based schema for "advanced_configuration" block:
Terraform Plugin Framework migrated schema:
Terraform Configuration Files
Use-case 1 (no blocks configured):
Use-case 2 (all blocks configured):
Debug Output
Expected Behavior
After user upgrades to new provider version with the framework-migrated resource, user should not see any planned changes for optional+computed lists/set blocks when running
terraform plan
and not receive errors when runningterraform apply
.Actual Behavior
On running
terraform plan
below plan was produced by Terraform: Use-case 1 (no blocks configured):Use-case 2 (all blocks configured):
Steps to Reproduce
terraform init
for provider v.X (contains-Plugin-SDK-based-resource-implementation) for above resourceterraform plan
terraform apply
terraform plan
again -> plan returnsNo changes.
terraform init --upgrade
terraform plan
-> Plan is not emptyReferences
Included advanced_configuration schema in this issue. Other block schemas mentioned in the example are: