Closed click2cloud-sagarB closed 6 months ago
Hi, @click2cloud-sagarB. Thank you for raising the issue. A few doubts and requests:
When you say "stalls", what is the status of the tasks? Are both cloud
and shadow
tasks marked as queued? Could you provide a screenshot of the client.monitor()
table?
Is this happening if you run the workflow on a new region/time range that is not cached?
Are you passing the Planetary Computer Key as a parameter to the workflow? We recently changed that and the PC key is now a required parameter.
Could you provide the logs for the orchestrator, cache. and workers so we can investigate what might be happening?
They are located in the logs
folder of your storage (e.g., ~/.cache/farmvibes-ai/logs
).
Hi @rafaspadilha, Below are the answers to your queries.
Cloud and shadow is in running state, while all subsequent jobs are in the pending status. I have shared the log file and screenshot of client.monitor().
It does not occur when we perform the workflow on a new region/time range that is not cached. When we execute the workflow for the first time in a new region or time period, it completes successfully.
No we do not pass Planetary Computer Key as a parameter to the workflow data_ingestion/spaceeye/spaceeye_interpolation. But I'm not sure whether it was mandatory because if it was, I doubt the workflow would have completed smoothly the first time around.
I have attached log files for your reference logs.zip
Hey, @click2cloud-sagarB. Looking through the logs, it seems like there was an error during communication between the orchestrator and cache pods.
Please, could you try deleting the cache pod and re-run the workflows?
You can do that with:
$ ~/.config/farmvibes-ai/kubectl delete pods -l app=terravibes-cache
Let me know if this solves the issue.
Hi @rafaspadilha deleting the cache pod temporarily solves the issue for one rerun but it reoccurs in next rerun.
As we discussed offline, the logs that you shared have several runs. Please, @click2cloud-sagarB, could you recreate your cluster, delete the logs, reproduce this error and share the new set of logs with us again?
Hi @rafaspadilha, as reuquested I recreate the updated cluster are rerun the workflow but workflow remians in queued for hours. I am sharing the logs and ss for the same. Time range given : 2 months(datetime(2023, 11, 1), datetime(2023, 12, 31)) Polygon provided in Andrew.txt logs.zip Andrew.txt
@click2cloud-sagarB we have a new release of FarmVibes. We have a bugfix for the issue you were seeing. Please, when you have some time, could you update your cluster and see if the problem is fixed?
Feel free to reopen this issue if that is the case.
In which step did you encounter the bug?
Workflow execution
Are you using a local or a remote (AKS) FarmVibes.AI cluster?
Local cluster
Bug description
Dear Farmvibes Team, In the most recent release of farmvibes, we've noticed that the SpaceEye pipeline stalls on rerun at tasks "spaceeye.preprocess.cloud.cloud" and "spaceeye.preprocess.cloud.shadow." and all subsequent jobs in the spaceeye pileine were stuck as well. Please let me know if you need anything more from my end.
Steps to reproduce the problem
No response