Open achordia20 opened 1 year ago
cc: @rkooo567 @rickyyx to investigate and see whether this is an easy fix for @iycheng
This is expected now. We should probably update the FT doc to mention the limitation cc @iycheng.
Since we don't persist the task information now (it also may add a lot of overhead to GCS if we do so), it is not possible to retrieve the task data when the head node is restarted.
The task should still run as expected IIRC. This feature is just broken because we don't persist task events to persistent storage.
I was seeing this on new jobs being submitted. I can understand losing state if the head node is restarted so older jobs data isn’t visible but not after the node is up.
Do you restart the GCS in the middle of your script, or did you run script -> retart -> run another script (and data is lost)?
I don’t touch GCS at all. To do any sort of upgrades to the cluster I have to delete the entire Ray cluster and recreate it. After doing so I run the script and don’t see any tasks information in 2/3 places.
Hmm yeah this seems to be a bug from the task backend then. cc @rickyyx to reproduce it.
What happened + What you expected to happen
After added GCS FT, I lost the ability to view task list/count/progress from the Jobs view. I was able to view the tasks in the Tasks Table though. The head node was restarted before the issue appeared not sure if this is relevant or not.
Versions / Dependencies
Ray 2.4.0
Reproduction script
Issue Severity
None