rfcx / arbimon

Ecoacoustic analysis platform empowering conservationists to analyze acoustic data and to derive insights about the ecosystem at scale
https://arbimon.org
Apache License 2.0
4 stars 1 forks source link

Model classification job stuck #2066

Closed koonchaya closed 2 months ago

koonchaya commented 3 months ago

Original report: https://rfcx.slack.com/archives/C03FD1WD02J/p1719874553286379

Project: https://arbimon.org/project/trabajo-de-grado-vvm/jobs

Since Tuesday, June 25th, I have been attempting to create my model. However, despite checking the progress in the "Active Jobs" tab over several days, I have noticed that the job status has been stuck at 20.8% and has not advanced at all since one week approximately. I understand that sometimes jobs can take awhile to start if there is a long queue on the server, but it seems unusual for the progress to freeze once it has already started. I would like to request your assistance in resolving this issue as soon as possible. The name of my project is "Trabajo de grado VVM".

Image

RatreeOchn commented 3 months ago

The state is completed but progress is 8,344 not equal to progress_steps 40,079

Image

rassokhina-e commented 3 months ago

we tried to find any logs related to the current job, to understand why the job was stuck. I found a huge list of logs on the remote server where id difficult to find this info. Our suggestion was to reduce the disc space usage, to speed up the queue work. We deleted an old logs and jobs folders.

For this particular job, we updated the status to waiting to test if the job queue can grab this job again and process it

@koonchaya

koonchaya commented 3 months ago

Restarted the job

Image

carlybatist commented 3 months ago

@koonchaya it seems these jobs had to re-start again? They're at a lower percentage now

Screenshot 2024-07-04 at 9 45 53 AM