minamijoyo / tfmigrate

A Terraform / OpenTofu state migration tool for GitOps
MIT License
1.14k stars 57 forks source link

Possible to optimise multi_state_mv execution time? #161

Open bharling opened 1 year ago

bharling commented 1 year ago

Hi, firstly thanks for creating tfmigrate we're finding it an extremely valuable tool !

I'm posting this to try and understand the reasoning behind the recreation of the tempfiles used during multi-state migrations and to see if there is some way to reduce that churn. In our specific case, we're attempting to split up a 250MB terraform state into multiple destination projects in order to speed up plan + apply against the resources managed in that state.

We noticed that tfmigrate plan was taking a long time in our tests, and digging into the code I saw that the multi-state move actions appear to copy the source + destination states to tempfiles at the start of each action before performing the resource moves first from source to an empty diffstate and then from diffstate to the a copy of the final state. After that it disposes of the temp files and repeats the whole process for the next action. I'm referring to this code particularly ->

https://github.com/minamijoyo/tfmigrate/blame/087694f67f7a589e4b628a7c1418e97a33dfd878/tfmigrate/multi_state_mv_action.go#L32

Can I clarify if that understanding is correct? If so, is there a reason that the source state needs to be copied for each action and is there any way we could remove that need? For us it would represent a dramatic speed up, and I feel like our use-case in this instance does align well with the spirit of tfmigrate.

minamijoyo commented 1 year ago

Hi @bharling, Yes. As you pointed out, the current implementation has some redundant copies of tfstate.

The terraform state command with the local backend only accepts tfstate as a file. Still, we are restricting interfaces of the tfexec package for state operation as a memory representation. This is because a state can contain sensitive values, and we wanted to minimize the lifetime of the temporary file to ensure that it must be deleted even if an error occurs by using a defer function.

At the time of the initial implementation, I had never imagined that tfstate would be 250 MB! However, as long as the life cycle of the temporary files can be managed correctly, I'm open to optimization by removing redundant copies.

Let me share what I came up with for a high-level design: the tfexec.State is currently a type alias of []byte, but by making it a first-class object which contains an *os.File handle of the temporary file, it would be possible to remove the responsibility of the temporary file management from tfexec.TerraformCLI. Then, there will be a question of who is now responsible for the lifetime of temporary files, but the current implementation of the Migrator interface is already huge. How about introducing a collection class such as StateBulkAction / MultiStateBulkAction? It also implements the StateAction / MultiStateAction interface and applies multiple state actions in bulk. We can delegate the responsibility of temporary file management to the collection class and reuse the temporary files across actions.