Open LahiLuk opened 1 year ago
@LahiLuk I'll have to look into it, can you share a minimal example that will result in the error? If it is as simple as adding an option when we initialize the gcp client feel free to open a PR yourself, currently, I can't give you a timeline as to when I can resolve this.
Hi @dacbd,
anything that will produce a log larger than 4 MB should result in the error. Here's an example:
terraform {
required_providers { iterative = { source = "iterative/iterative" } }
}
provider "iterative" {}
resource "iterative_task" "grpc-error-example" {
cloud = "aws"
machine = "t2.micro"
spot = -1
image = "ubuntu"
region = "eu-west-1"
storage {
workdir = ""
output = ""
}
script = <<-END
#!/bin/bash
while true; do
echo "Hello, World!"
sleep 0.01 # Slow down log creation a bit
done
END
}
I'm not sure if I'll be able to open a PR since I'm new to Terraform, and I've never used Go, but I'll try and look into it. It seems in any case that other providers were able to increase the maximum message size, see for example terraform-plugin-go.
In the meantime, do you have a suggestion for a workaround? A crude one I came up with is to redirect all shell output to a file, but that complicates log monitoring... Since TPI is geared towards running ML experiments, I find it a bit weird that no one ran into this issue yet, since those logs tend to be quite detailed and the datasets large...
I'll take a look at the example you linked.
On Fri, Sep 29, 2023, 07:35 Lahorka Nikolovski @.***> wrote:
Hi @dacbd https://github.com/dacbd,
anything that will produce a log larger than 4 MB should result in the error. Here's an example:
terraform { required_providers { iterative = { source = "iterative/iterative" } } } provider "iterative" {}
resource "iterative_task" "grpc-error-example" { cloud = "aws" machine = "t2.micro" spot = -1 image = "ubuntu" region = "eu-west-1"
storage { workdir = "" output = "" } script = <<-END
!/bin/bash
while true; do echo "Hello, World!" sleep 0.01 # Slow down log creation a bit done
END }
I'm not sure if I'll be able to open a PR since I'm new to Terraform, and I've never used Go, but I'll try and look into it. It seems in any case that other providers were able to increase the maximum message size, see for example terraform-plugin-go https://github.com/hashicorp/terraform-plugin-go/pull/139.
In the meantime, do you have a suggestion for a workaround? A crude one I came up with is to redirect all shell output to a file, but that complicates log monitoring... Since TPI is geared towards running ML experiments, I find it a bit weird that no one ran into this issue yet, since those logs tend to be quite detailed and the datasets large...
— Reply to this email directly, view it on GitHub https://github.com/iterative/terraform-provider-iterative/issues/759#issuecomment-1740994750, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIN7M7ABW5Y4FMUGFGHH2DX43MCPANCNFSM6AAAAAA5ALRPK4 . You are receiving this because you were mentioned.Message ID: @.***>
@dacbd,
I also forgot to mention... It seems that the task itself actually keeps on running after the error, and the logs keep being written to s3. It's just that any terraform commands run locally fail with the error.
@0x2b3bfa0 can you try and take a look at this, I'm hoping it might be as simple as updating our terraform-provider-sdk, or we need to add something more to the plugin.Serve
https://github.com/iterative/terraform-provider-iterative/blob/763e7a1026bca3d31790727c52dacb5e02e98abf/main.go#L13C2-L17
Hello,
I encountered the following error while running a task with TPI:
Due to the error, the provisioned EC2 instance stopped producing expected outputs, but still kept running. I could not run
terraform destroy
and had to terminate the instance and all other resources manually.When running the same task on a smaller subset of data, the task completes successfully.
If I understand correctly, gRPC uses the default 4MB message size unless configured to allow a larger size. Is there a way for TPI plugin users to configure this setting?
Environment Details: