Cache Retrieval Error in Workflow: Maximum Recursion Depth Exceeded

microsoft / farmvibes-ai

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability

https://microsoft.github.io/farmvibes-ai/

MIT License

680 stars 118 forks source link

Cache Retrieval Error in Workflow: Maximum Recursion Depth Exceeded #121

Closed click2cloud-sagarB closed 12 months ago

click2cloud-sagarB commented 1 year ago

Dear Farmvibes Team,

Rerunning the workflow fails with an error. Failed to obtain cache information for op list_sentinel2_products_pc with exception 'RecursionError'>: maximum recursion depth exceeded in instancecheck.

I tried restarting the clustor (farmvibes-ai local restart) and destroying the old clustor and creating a new one (farmvibes-ai local setup), but the problem persists.

Attached to this issue, you will find the following files for your reference:

Log file for workflow failure
Log files for each of the pods involved in the process: terravibes-worker terravibes-orchestrator terravibes-rest-api

Workflow_Error_Logs_Files.zip

rafaspadilha commented 1 year ago

Hi, @click2cloud-sagarB. Thank you for raising this issue.

We saw it internally and are currently investigating possible causes and solutions. This seems to happen more frequently when running a lot workflows in parallel.

A temporary workaround is to delete the cache pod with:

~/.config/terravibes/kubectl delete pod -l app=terravibes-cache

and re-run the desired workflows.

I'll keep this issue updated as we progress on our investigations.

shenoy10 commented 1 year ago

Hi, Is there an update on a fix for this issue? Currently experiencing it.

rafaspadilha commented 1 year ago

We are planning a release for this week with a fix for this issue, @shenoy10.

rafaspadilha commented 12 months ago

The issue was caused by pydantic raising a RecursionError when trying to (de)serialize one of our dataclasses. The new release should have fixed this issue.

We are closing this issue for now, but @click2cloud-sagarB and @shenoy10, let us know if you still face any issue.