mlowicki / rhythm

Time-based job scheduler for Apache Mesos
MIT License
29 stars 2 forks source link

Job history #17

Closed zolech closed 6 years ago

zolech commented 6 years ago

My second proposition would be to add ability to pull job history thru api. Right now after connecting to endpoint "/api/v1/jobs/group/project/id" I get only LastFail

"LastFail": { "Message": "Reconciliation: Task is unknown to the agent", "Reason": "REASON_RECONCILIATION", "Source": "SOURCE_MASTER", "When": "2018-10-12T09:22:13.466889392+02:00" },

The nice solution would be to specify parameter for example "?history" that would give me more information about previous runs and most importang: LOGS (stdout and stderr from executors sandbox) in a form of path to executor's sanbox. There is a similar functionality in Marathon, when you click on task you can download any file from it's sandbox.

mlowicki commented 6 years ago

I see two separate requests here:

  1. store history (n latest runs)
  2. access to logs (like in Marathon)

In 1. you're talking only about only failed runs or all? What would be the best value for n ?

zolech commented 6 years ago

I am talking about all runs: failed and succeeded just for a sake of being able to read their logs. If I could specify a status of runs that are being retrieved, it would be even more awesome.

About best value - I don't know, sometimes I need to check runs that are more than a week old. Maybe it could be a flexible value?

mlowicki commented 6 years ago

Access to any files in sandbox can be done via http://mesos.apache.org/documentation/latest/sandbox/#via-the-mesos-web-ui:

screenshot 2018-10-19 19 02 13

Adding new fields to LastFail is needed:

Then having already FrameworkID and list of endpoints, URL to display executor's run (and respective links to sandboxes) is:

MESOS_ADDR/#/agents/AGENT_ID/frameworks/FRAMEWORK_ID/executors/EXECUTOR_ID

Maybe API can simply return such link in LastFail? Storing n previous runs is a different story so let's first establish what is the best way to give easy access to files from sandbox like stdout.

zolech commented 6 years ago

Yes, link would suffice.

mlowicki commented 6 years ago

@zolech renamed LastFail to LastFailedTask which also contains more details + ExecutorURL which points to Web UI with executor's details like completed tasks (https://github.com/mlowicki/rhythm/commit/69718439c7a039f059b13c5ef4ce437bcc5b4b5b). It's now on master branch. Please let me know what do you think. Adding history of launched tasks is next step.

mlowicki commented 6 years ago

@zolech first draft of history is on https://github.com/mlowicki/rhythm/tree/tasks_history. For now I've added endpoint api/v/1/jobs/{group}/{project}/{id}/tasks which returns job's tasks (runs). In the background there is a component doing cleanup of old tasks - by default only tasks from last 24h are kept but it's configurable with storage.zookeeper.taskttl (in milliseconds).

mlowicki commented 6 years ago

@zolech tasks history landed on master. Feel free to test it out.