usgin / modelmanager

USGIN Content Model Management App
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

CM Validator Performance Issues #18

Open asonnenschein opened 10 years ago

asonnenschein commented 10 years ago

Validator page is showing server errors. Some thoughts --

Python runs in a blocking event loop. Django handles requests to the website and throws up new processes on the server for them as new users access the website. This is so that users don't have to wait in line behind each other to load HTML templates.

Everything within the process that gets spun up for you is synchronous, including the processing of the CSV file. So when the user goes and uploads a large file, they're stuck until their Python thread is done with it's processing and can then go back to spitting out HTML templates. This inefficiency in file processing could be what's bogging down the server.

APIs like celery and redis exist to deal with these sorts of problems in Python. If a distributed processing system were to be implemented here, we'd have a lot more control over how memory is being allocated and how to mitigate the user having to sit around and wait for their file to process.

The workflow would go from this:

--> django (pid 1) --> validator (pid 1) --> django (pid 1) -->

--> django (pid 2) --> validator (pid 2) --> django (pid 2) -->

To something like this:

                        --------------> django (pid 1) ---------------->
--> django (pid 1) -->                           validator (pid 4) -->  django (pid 1) -->
                       distributor (pid 3) -->
--> django (pid 2) -->                           validator (pid 4/5) -->  django (pid 2) -->
                        --------------> django (pid 2) ---------------->

This would be a major enhancement, but it would free up space on individual threads and allow the website to operate independently from the validation routine.

asonnenschein commented 10 years ago

The work being done here can be used to fix this. Implementation of a celery daemon to distribute Python processes (async Python, who knew?).