Closed claudinoac closed 3 years ago
I need a bit more information, as this appears to be a data corruption but you've not given much for me to work with beyond that your use multiprocess mode. For example, does your code fork?
Well, I've checked that we're using only one process and one thread for the server, but the crons run in separate processes. And, no, there's no forks within the code. And we're delivering the metrics through an endpoint within the django application (/metrics)
Another thing that I've observed, is that the crons are superposing themselves sometimes (because of the amount of data being processed). Since the crons are using the same files (..._cron.db), can this be the origin of this corruption? Should I use different prefixes for each cron (like the PID)?
Since the crons are using the same files (..._cron.db), can this be the origin of this corruption? Should I use different prefixes for each cron (like the PID)?
Can you explain more about what this is? This sounds like you've broken things by poking around internals, as that is not a filename this code can produce out of the box.
I did this -> https://github.com/korfuri/django-prometheus/blob/2.0.0/documentation/exports.md#exporting-metrics-in-a-wsgi-application-with-multiple-processes-globally to avoid having lots of file descriptors, having only one per uWSGI process and one for the crons.
I'm not deeply familiar with WSGI and cron, what's the worker id in that case?
Each worker in uWSGI is a process, spawned by the master process.
The worker id is the id defined by the uWSGI to its child processes (0, 1, 2 if it has three processes/workers).
The crons I'm referring are the django custom commands or batch jobs (python manage.py <command>
) that are scheduled through crontab to run periodically (4 hours, I guess).
These batch jobs have its own metrics, and I configured the server to use file descriptors which has the prefix "cron" instead of the PID of the cron which is running (gauge_all_cron.db
, histogram_all_cron.db
,...).
Since each cron has a new PID, using the default configs I would have lots of file descriptors in a short time. (gauge_all_<PID>
, ...)
Does this make sense?
Two processes can't safely share an ID, so there's your problem.
I start using the PID of the crons as prefix for the files, so they (the processes) won't share any db files. The application is stable since then, so I guess we can close the issue. Thanks!
The issue
JSON unicode decode error after few days running a production server (using multiprocess mode).
The configuration
What happens
The server throws a JSON unicode decode error after few days running. The server has a minimal load (below 300req/min) The server doesn't run without cleaning the multiprocess directory, returning the same error when restarting uWSGI. Also this server exports some metrics within crons, with the same file prefix for all crons. (can this be the problem?)
At first, these errors were happening after the VM was forcibly restarted.
I found a similar issue (https://github.com/prometheus/client_python/issues/357), but idk if it is about the same problem, since that problem was fixed on previous versions.