Open Kamilcuk opened 1 year ago
Thanks for the suggestion @Kamilcuk! It would be helpful to make the failure more clear. I've placed this into our board for further roadmapping.
Side note: the message says that job was reverted to version 14, but then it started running version 16. This is confusing.
Job versions are immutable, so "reverting to version 14" means creating a new version based on the spec of version 14. I agree that it can be confusing, but just wanted to explain that this is intended result.
Proposal
nomad run
monitors evaluation. It repeats a lot of information, prionts a lot of unrelated stuff that I have to go through when analyzing logs.I propose to clean up the logs. Logs currnetly look like the following:
``` + nomad job run streamlit.nomad.hcl ==> 2023-08-23T10:47:47Z: Monitoring evaluation "d01ca58f" 2023-08-23T10:47:47Z: Evaluation triggered by job "streamlit-htsopportunities" 2023-08-23T10:47:48Z: Evaluation within deployment: "037572cd" 2023-08-23T10:47:48Z: Allocation "638e8a80" created: node "0a4e222d", group "streamlit-htsopportunities" 2023-08-23T10:47:48Z: Evaluation status changed: "pending" -> "complete" ==> 2023-08-23T10:47:48Z: Evaluation "d01ca58f" finished with status "complete" ==> 2023-08-23T10:47:48Z: Monitoring deployment "037572cd" 2023-08-23T10:47:48Z ID = 037572cd Job ID = streamlit-htsopportunities Job Version = 15 Status = running Description = Deployment is running pending automatic promotion Deployed Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy Progress Deadline streamlit-htsopportunities true false 1 1 1 0 0 2023-08-23T06:57:47-04:00 2023-08-23T10:52:48Z ID = 037572cd Job ID = streamlit-htsopportunities Job Version = 15 Status = running Description = Deployment is running pending automatic promotion Deployed Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy Progress Deadline streamlit-htsopportunities true false 1 1 1 0 1 2023-08-23T06:57:47-04:00 2023-08-23T10:57:47Z ID = 037572cd Job ID = streamlit-htsopportunities Job Version = 15 Status = failed Description = Failed due to progress deadline - rolling back to job version 14 Deployed Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy Progress Deadline streamlit-htsopportunities true false 1 1 1 0 1 2023-08-23T06:57:47-04:00 2023-08-23T10:57:48Z ID = eb78f148 Job ID = streamlit-htsopportunities Job Version = 16 Status = running Description = Deployment is running Deployed Task Group Auto Revert Desired Placed Healthy Unhealthy Progress Deadline streamlit-htsopportunities true 1 1 0 0 2023-08-23T07:07:47-04:00 2023-08-23T10:57:59Z ID = eb78f148 Job ID = streamlit-htsopportunities Job Version = 16 Status = running Description = Deployment is running Deployed Task Group Auto Revert Desired Placed Healthy Unhealthy Progress Deadline streamlit-htsopportunities true 1 1 1 0 2023-08-23T07:07:58-04:00 2023-08-23T10:58:01Z ID = eb78f148 Job ID = streamlit-htsopportunities Job Version = 16 Status = successful Description = Deployment completed successfully Deployed Task Group Auto Revert Desired Placed Healthy Unhealthy Progress Deadline streamlit-htsopportunities true 1 1 1 0 2023-08-23T07:07:58-04:00 ```Would be great to print less stuff and not repeat information that much and most importantly print exactly _why_ the deployment has failed. And do not print "deployed" if the job was not deployed, but reverted.
``` + nomad job run streamlit.nomad.hcl ==> 2023-08-23T10:47:47Z: Monitoring evaluation "d01ca58f" 2023-08-23T10:47:47Z: Evaluation triggered by job "streamlit-htsopportunities" 2023-08-23T10:47:48Z: Evaluation within deployment: "037572cd" 2023-08-23T10:47:48Z: Allocation "638e8a80" created: node "0a4e222d", group "streamlit-htsopportunities" 2023-08-23T10:47:48Z: Evaluation status changed: "pending" -> "complete" ==> 2023-08-23T10:47:48Z: Evaluation "d01ca58f" finished with status "complete" ==> 2023-08-23T10:47:48Z: Monitoring deployment "037572cd" 2023-08-23T10:47:48Z ID = 037572cd Job ID = streamlit-htsopportunities Job Version = 15 Status = running Description = Deployment is running pending automatic promotion TO BE DEPLOYED Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy Progress Deadline streamlit-htsopportunities true false 1 1 1 0 0 2023-08-23T06:57:47-04:00 2023-08-23T10:52:48Z Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy Progress Deadline streamlit-htsopportunities true false 1 1 1 0 1 2023-08-23T06:57:47-04:00 2023-08-23T10:57:47Z Description = Failed due to progress deadline - rolling back to job version 14 REVERTING Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy Progress Deadline streamlit-htsopportunities true false 1 1 1 0 1 2023-08-23T06:57:47-04:00 2023-08-23T10:57:48Z Job Version = 16 Status = running Description = Deployment is running REVERTED Task Group Auto Revert Desired Placed Healthy Unhealthy Progress Deadline streamlit-htsopportunities true 1 1 0 0 2023-08-23T07:07:47-04:00 2023-08-23T10:57:59Z Task Group Auto Revert Desired Placed Healthy Unhealthy Progress Deadline streamlit-htsopportunities true 1 1 1 0 2023-08-23T07:07:58-04:00 Status = successful Description = Deployment completed successfully REVERTED Task Group Auto Revert Desired Placed Healthy Unhealthy Progress Deadline streamlit-htsopportunities true 1 1 1 0 2023-08-23T07:07:58-04:00 VISIBLE ERROR: job was reverted! ```Side note: the message says that job was reverted to version 14, but then it started running version 16. This is confusing.
Use-cases
The use case is to simplify the analyzation of logs in long logs streams. The proposal is
Attempted Solutions
The solution is to read the logs very carefully.