Closed zolech closed 6 years ago
I see two separate requests here:
n
latest runs)In 1. you're talking only about only failed runs or all?
What would be the best value for n
?
I am talking about all runs: failed and succeeded just for a sake of being able to read their logs. If I could specify a status of runs that are being retrieved, it would be even more awesome.
About best value - I don't know, sometimes I need to check runs that are more than a week old. Maybe it could be a flexible value?
Access to any files in sandbox can be done via http://mesos.apache.org/documentation/latest/sandbox/#via-the-mesos-web-ui:
Adding new fields to LastFail
is needed:
Then having already FrameworkID and list of endpoints, URL to display executor's run (and respective links to sandboxes) is:
MESOS_ADDR/#/agents/AGENT_ID/frameworks/FRAMEWORK_ID/executors/EXECUTOR_ID
Maybe API can simply return such link in LastFail
? Storing n
previous runs is a different story so let's first establish what is the best way to give easy access to files from sandbox like stdout.
Yes, link would suffice.
@zolech renamed LastFail
to LastFailedTask
which also contains more details + ExecutorURL
which points to Web UI with executor's details like completed tasks (https://github.com/mlowicki/rhythm/commit/69718439c7a039f059b13c5ef4ce437bcc5b4b5b). It's now on master branch. Please let me know what do you think. Adding history of launched tasks is next step.
@zolech first draft of history is on https://github.com/mlowicki/rhythm/tree/tasks_history. For now I've added endpoint api/v/1/jobs/{group}/{project}/{id}/tasks
which returns job's tasks (runs). In the background there is a component doing cleanup of old tasks - by default only tasks from last 24h are kept but it's configurable with storage.zookeeper.taskttl
(in milliseconds).
@zolech tasks history landed on master. Feel free to test it out.
My second proposition would be to add ability to pull job history thru api. Right now after connecting to endpoint "/api/v1/jobs/group/project/id" I get only LastFail
"LastFail": { "Message": "Reconciliation: Task is unknown to the agent", "Reason": "REASON_RECONCILIATION", "Source": "SOURCE_MASTER", "When": "2018-10-12T09:22:13.466889392+02:00" },
The nice solution would be to specify parameter for example "?history" that would give me more information about previous runs and most importang: LOGS (stdout and stderr from executors sandbox) in a form of path to executor's sanbox. There is a similar functionality in Marathon, when you click on task you can download any file from it's sandbox.