sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

Executors feedback tracking #41077

Open eseliger opened 2 years ago

eseliger commented 2 years ago

metrics_environment_label: It's unclear those have to be the same. A customer set them as two distinct values and it didn't work. Got confused how to set EXECUTOR_METRIC_ENVIRONMENT_LABEL on the worker to two values. sourcegraph-executor AWS cloudwatch log group: This causes problems when there are two executors to be configured. Also, when there are two separate pools to be created, this will fail. metrics_environment_label: Customer didn't realize this was required when you run 2 envs, because it was conflicting for him. Allow EBS encryption key for AWS. Customer has requirements there. Cost for AWS is obnoxious, document that and suggest not using firecracker for cheaper executors. Feedback on registry mirror: It also seems like the docker image proxy is a bit much, it might also be simpler to just use something like ECR, and for POC i imagine most customers, day 2, need a way to wire up a private registry Customer asked if we considered something simpler for POCs: runs these jobs on ECS or Google Cloud Run ? ECS + Fargate seems like a :chefkiss: for POCs AWS resources don't have names. These are set via tags. Customer noted they have standard naming schemes, so being able to overwrite those would be great. Not 100% clear that "type of workload" for the executor queue name variable means "enum of X and Y": It says the type of work, e.g. codeintel/batches however that's the only options you can use, I thought it was a user-configurable variable, so I put development this was returning a 404 when, however when I went back and changed it, the dev box had shut down (without me realising) and I got a different error - but put it down to DNS resolution :slightly_smiling_face: A second customer encountered this today. Customer was not clear they only need 1 set of credentials, they generated 2 and got confused how to apply both. I said: Oh, you just need 1 set of credentials, since the service writing the scaling metric is the same for both queues anyways. Use the credentials thing once (or create the cred in a way that suits you really) and make sure it’s available to the worker service of the Sourcegraph deployment Binary install: Feedback on the Sourcegraph Executor install instructions. On Step 3 (Generate Ignite base image), it includes a command to download the Dockerfile to build the image, however, using the downloaded Dockerfile fails, because it depends on another file present in the same repo directory: stat autologin.conf: file does not exist https://github.com/sourcegraph/sourcegraph/issues/41017 Feedback for variable machine sizes: I have a quick question: it seems like Today the functionality is around us maintaining a pool of agents with same hardware capabilities. We saw a few local issues with gradle builds that require certain memory or cpu things. wondering if there are plans to request specific kind of agents or is it planned to just global or maybe I'm missing something.

From https://github.com/sourcegraph/accounts/issues/6312 https://github.com/sourcegraph/accounts/issues/565 https://github.com/sourcegraph/accounts/issues/2417

eseliger commented 1 year ago

additional feedback from Sam:

In the documentation we say to do certain things if using firecracker, but we have no instructions of what to do if not using firecracker. Even if the instruction is skip this step, it would be much more clear than no direction at all When I set EXECUTOR_USE_FIRECRACKER=false, executor run still checks for ignite, which shouldn’t be used as far as I understand, so it’s a misleading log entry (and it says error, but the machine can still pick up jobs) I assume batches and codeintel are the only options for EXECUTOR_QUEUE_NAME, but it maybe could be a hair clearer when we say “Possible values” to say “Accepted values” or something that is more explicit that those are the only options. Based on the above, I also assume that a single executor VM can only be used for either batches or code intel, and I don’t find it clear that I have to stand up multiple instances to handle both types of jobs We say to move the binary to /usr/local/bin, but at least on my Ubuntu distro that is owned by root, so I had to sudo that, not sure if everyone would need to and if it should be reflected that way in docs. I also think it might be cleaner to give full paths where the binary is installed and when we move it (ex:$HOME/executors). I ran into an issue where I was in a directory that my default user didn’t have write access to. We also have the curl command with -sf in it, which is cleaner if everything works, but leaves something to be desired if you have an error for whatever reason This, I don’t know if it is a product of my environment, but running executor install/validate/run resulted in my terminal being stuck in program, is there a way to run it in some sort of detached mode? Not sure if using firecracker would fix this, but not everyone will have the ability to use it, I assume. Either documenting a detach mode OR being highly opinionated in running it as a systemctl service (and how to make sure the service starts up if the VM is rebooted or anything) would be helpful.

eseliger commented 1 year ago

This ticket captures most of the unaddressed feedback we got so far. We should compile a list and work it off, most of those feel like docs things. Other things should just become it's own ticket.