tillmo / two_tiered_quiz

A multiple choice quiz with justifications for the answers, written in Django
GNU Affero General Public License v3.0
0 stars 0 forks source link

configure deployment of server for concurrent requests #56

Closed tillmo closed 4 years ago

tillmo commented 4 years ago

for up to 300 users

jelmd commented 4 years ago

Not sure about server, but the one who has chosen django and react should work this out. I've no clue about this stuff, or any other related monitoring tools ...

tillmo commented 4 years ago

@MGlauer could you please make a suggestion here?

MGlauer commented 4 years ago

I do not know much about React either, but the django backend could be deployed using the usual WSGI-configuration - just not on port 80, I guess.

tillmo commented 4 years ago

For React, this is not so much a problem, because the whole website is downloaded once and for all to the browser.

It seems that WSGI is a standard that is supported by many different servers. So it would be nice to have some sample deployment plan that works easily and does not leave us to make all those choices. So what is "the usual WSGI-configuration"?

MGlauer commented 4 years ago

Django comes with built-in support for wsgi and the corresponding script is already there. This can be uses to deploy the service with Apache and mod_wsgi

tillmo commented 4 years ago

Thanks! Do you have a working mod_wsgi configuration, especially what number of processes and threads concerns? See also here. I guess the Python GIL is not relevant here, because we do not use global variables? Then we can use many threads?

jelmd commented 4 years ago

Apache with a prepared default configuration is already in place (schema F - part of the zone configuration). See /data/httpd/conf/wsgi.conf. However, whether this is a good config, depends on the application. Getting an appropriate one is usually an incremental process of adjustment on demand/observed results.

By using apache httpd one has at least a way to check the status of apache httpd (like http://iks.cs.ovgu.de/server-status - use proxy or from intern). My guts say, that configuring it as daemon is better, because this way it cannot pull down/render the httpd unusable, even if the app or interpreter has memory leaks, dead locks or other problems. It may imply some communication overhead and thus "slowness", but the casual user probably is not able to realize this because of the other network noise on the client/client-network-provider/server-network-provider/server ...

Wrt. GIL: Multi-threading is per rule of thumb what I prefer: Saves usually a lot of RAM and is pretty efficient (almost no communication overhead, rare locking). However, if I got the GIL stuff right, it is not really whether global vars are used (a general multi-thread app "feature" one needs to take care off), but that it is like "stop the world" - if one thread gets executed, the interpreter stops all other threads. So IMHO more or less still a monolithic application, not really able to run stuff in parallel. But perhaps it is like in the early days: in python multi-threading means actually forking, i.e. spawning several fully fledged processes and the parent of them distributes requests to them ...

Anyway, the app should be prepared, that several instances might be run and requests from a certain user XY may end up being processed by different app instances (usually important for sessions, aka state related requests) ...

tillmo commented 4 years ago

Generally, only in certain situations, the GIL causes threads to wait, see here. Here you have a rule of thumb: "Use at most 5 threads per process, unless excessively I/O bound." Here you can find that Python multi-threading makes sense with a database connection (which is our main application). They also use 5 threads. However, it seems that MariaDB has faster multi-threading than the Mysql community edition.

jelmd commented 4 years ago

Switching to an unsupported DB without any reason, or just because someone spreads unfounded statements makes no sense. We run mysql for several years, very stable without any problems.

A DB, no matter how fast it is, cannot cure flawed applications, which e.g. do not use pooled connections/DBs or are otherwise impaired by limited capabilities of the executing interpreter ...

tillmo commented 4 years ago

MariaDB is well-supported. It is a fork of MySQL that has been created by the creator of MySQL, who has been unhappy about Orcale's handling of MySQL. MariaDB is configured in the same way as MySQL. It is packaged by Ubuntu. Moreover, the performance improvements of MariaDB are independent of Python and the CIL:

Segmented Key Cache MariaDB introduces another performance improvement in the form of Segmented Key Cache. In a typical cache, various threads compete to take a lock over the cached entry. These locks are called as mutexes. When multiple threads are competing for a mutex, only one of them is able to get it while others have to wait for the lock to get freed before performing the operation. This leads to execution delays in these threads slowing down the database performance. In case of Segmented Key Cache, the thread need not lock the entire page, but it can lock only the particular segment to which the page belongs. This helps multiple threads to work in parallel thereby increasing the parallelism in the application leading to better performance of the database. https://hackr.io/blog/mariadb-vs-mysql

For similar performance improvement for MySQL, you have to buy a commerical version.