pythononwheels / pow_devel

development repo for PyhtonOnWheels framework
www.pythononwheels.org
MIT License
75 stars 10 forks source link

PythonOnwheels API Performance #46

Closed jeffny2015 closed 3 years ago

jeffny2015 commented 4 years ago

Hi

Do you know a way to improve the server performance(concurrency) of the request to the restful api??????

jeffny2015 commented 4 years ago

I noticed this application is none multithreading, i was searching to do something with this i found this library PyZMQ, and found tornado has some methods , but i dont have an idea to implement this in the server config or the base handler

pythononwheels commented 4 years ago

One of the advantages of Tornado is that it is async from the ground. Sou you can use any of the examples and ways to make your handler methods async as described here.

But you have to make sure that all the libraries and modules you use in the method are async as well. So when you use ZMQ or a DB like mongo, you need to use a driver that's async as well. For MongoDB this would be motor for example. I am not so into ZMQ so I don't have experience there. But I'm pretty sure that there are async driver for that as well. Maybe even PyZMQ is async ??

AsyncIO support for zmq

Requires asyncio and Python 3.

New in version 15.0.

As of 15.0, pyzmq now supports asyncio, via zmq.asyncio. When imported from this module, blocking methods such as zmq.asyncio.Socket.recv_multipart(), zmq.asyncio.Socket.poll(), and zmq.asyncio.Poller.poll() return Future s.

So in short: Any documentation about async for Tornado can be implemented 1:1 in PoW as well.

jeffny2015 commented 3 years ago

thanks, im a little confused, i want to explain me better

What i am searching is how can i implement the application to support multiple clients in parallel??

jeffny2015 commented 3 years ago

something like this

https://www.tornadoweb.org/en/stable/httpserver.html

sockets = tornado.netutil.bind_sockets(8888)
tornado.process.fork_processes(0)
server = HTTPServer(app)
server.add_sockets(sockets)
IOLoop.current().start()

i have worked with sockets and i know you usually set the amount of clients who can connect to the server at the same time. The issue Im having now is

I have the pythononwheels framework running a restful api, to test the performance of the server i just simulate in jmeter more than 50 clients requesting at the same time to the server

client request

list top 10000 users from db in a json response

http://pythononwheels_server/path/endpoint/listusers/10000

and the does not crash but is executing the request in a secuence, and the jmeter 50++ clients fails wiouth any answer

pythononwheels commented 3 years ago

i have worked with sockets and i know you usually set the amount of clients who can connect to the server at the same time.

No, thats not an issue here. That is usually limited by the OS ... No worry for you in this case. You use python, tornado .. som many abstraction levels above raw sockets.

and the does not crash but is executing the request in a secuence,

Yes, correct. This is exactly how synchronous applications handle multiple requests to the same API point. Handle 1st, response to 1st, handle second ...

You can optimize speed in many ways. It makes sense to think about the actual scenario you will be really facing in production, to make the right decisions.

Questions would be:

So in most cases using one of the other common optimisation techniques will be way enough to make your app really fast.

1st) caching helps a lot. Especially when coming to typical requests that many users will take and the results wont change often. Blogs, Web Stores are typical examples. There is no new blog entery every second nor will the products or prices change super rapidly. And if so, you update the cache once and again serve from the cache again. Your test is a typical example. 50 Clients accessing the same API call. If you can server the result page completely from a cache (say Redis for example) you will speed up your app to simple the time returning the already rendered page from the cache.

2nd) Identify the bottlenecks. If you have some long running DB queries, optimize them. Use Indexes. Split them in parts and load only the first few results ... and update the page while already showing results..

3rd) use multiple Instances of your app. Python is single Threaded by default. So you can make a thread async. Meaning to already start serving other requests for blocking i/O. But it never gets really multithreaded. But you can easily start multiple instances of your app and use for example NGINX to make a load balancing between them. See this example here which is almost working out of the box running on linux systems. You simply start your PoW app say 4 times on different local ports, register them to NGINX and NGINX will redirect every cal to another Instance of your app. (Simply apt install and copy the config script). So gaining factor 4.


http {
    # Enumerate all the Tornado servers here
    upstream frontends {
        server 127.0.0.1:8000;
        server 127.0.0.1:8001;
        server 127.0.0.1:8002;
        server 127.0.0.1:8003;
    }

So summing it up I would recommend to make some of the 1 to 3 optimisations first. Identifying slow parts (Queries, I/O, other API calls) and optimizing them is key. Caching the results and serving the results with the highest hits in your app from cache is making your app fast like hell since you reduce the response time to a RAM memory access and sending pre rendered pages through the net. Can probably not be faster. Adding more processes is no problem in production but is limited to num_cpu_cores - some for the system. So on a 10 core maybe run 4-6 app instances.

Adding aysnc is possible in every way, since tornado is made up from the ground to support this. But you need to put some effort in it as a developer. Every submodule, library you use in an async call must be async as well. Normally most modern DB and other systems support async but you'll have to make your housekeeping. Like caching the callers you paused and need to fire results later on ... when their queries finish while others interrupted them. This would be (and clearly is) an option for bottlenecks where you find that none of the above solutions will work.. but mostly this should not be the case.

jeffny2015 commented 3 years ago

When you mean:

s. You simply start your PoW app say 4 times on different local ports, register them to NGINX and NGINX will redirect every cal to another Instance of your app. (Simply apt install and copy the config script). So gaining factor 4.

in what part of the code i can say run PoW 4 times ???

pythononwheels commented 3 years ago

in what part of the code i can say run PoW 4 times ???

It is simply in the NGINX config. It is independet of PoW.

But it is also truely worth (almost a must) to implement caching in production apps. Since usually the results of queries dont change from second to second or even if so, the changed do not really matter to the user.

Think of the first response page for a query term in amazon. You don't even see what is on page 2 or three. So you can easily cache the to 50 results per search-term. You update the cache whenever an article changed, only. For example by putting an update_cache to the article_updatemethod.

And also optimize slow (long running) queries. Most of the times you can optimize the way the data is stored or the query to gain factors of speed.

Adding more processes helps to server more users, it doesnt make the slow parts of the app quicker.

pythononwheels commented 3 years ago

performance can be solved app specifically via either usage of async handler methods and/or or load balancing using nginx for example. This is more app than PoW specific.