Concurrent fetches are slower than sequential

ibnesayeed commented 6 years ago

In IPWB (an web archiving replay system) we store archived HTTP headers and payloads in two separate objects for better deduplication of payloads. At the time of replay, we fetch the two objects, combine them, make necessary modifications, and return the response to the client. In our initial implementation we used to fetch the two objects sequentially, but there is an opportunity to perform these fetches concurrently and minimize the response time.

Recently, we experimented with Python threads (https://github.com/oduwsdl/ipwb/pull/425), but the response time became worse than before. A simple ApacheBench test took over 17 seconds with threaded fetches while the same took less than 8 seconds with sequential code.

To ensure that we can reproduce the issue in an isolated environment, I have created a separate repository, IPFS API Concurrency Test, to test this behavior. While the #59 suggests that the API is not async-friendly, I am not sure how related it is to this issue. I would assume, with concurrent threads it should not take any longer to finish if not quicker. The worst case time should still be bound by the sequential access time. However, my profiling results are showing the opposite. The more concurrent requests are made, the slower it becomes.

======= SUMMARY =======
Data Items:      1
Fetch Attempts:  10
Mean Fetch Time (Sequential):  0.0025439024 seconds/item
Mean Fetch Time (Threaded):    0.0021895885 seconds/item

======= SUMMARY =======
Data Items:      2
Fetch Attempts:  10
Mean Fetch Time (Sequential):  0.0037424684 seconds/item
Mean Fetch Time (Threaded):    0.0042521000 seconds/item

======= SUMMARY =======
Data Items:      3
Fetch Attempts:  10
Mean Fetch Time (Sequential):  0.0027309577 seconds/item
Mean Fetch Time (Threaded):    0.0067453702 seconds/item

======= SUMMARY =======
Data Items:      4
Fetch Attempts:  10
Mean Fetch Time (Sequential):  0.0036935270 seconds/item
Mean Fetch Time (Threaded):    0.0087552428 seconds/item

======= SUMMARY =======
Data Items:      5
Fetch Attempts:  10
Mean Fetch Time (Sequential):  0.0041139285 seconds/item
Mean Fetch Time (Threaded):    0.0102155050 seconds/item

ibnesayeed commented 6 years ago

There was an issue in my benchmarking script that was calculating the elapsed time wrongly. That's why threaded fetch times are reported longer than they should have been. However, results are still not how I would like them to be. Following are some timing summaries with single iteration for different number of concurrent item fetches.

======= SUMMARY =======
Data Items:      1
Fetch Attempts:  1
Mean Fetch Time (Sequential):  0.0068359375 seconds/item
Mean Fetch Time (Threaded):    0.0084149837 seconds/item

======= SUMMARY =======
Data Items:      2
Fetch Attempts:  1
Mean Fetch Time (Sequential):  0.0041242838 seconds/item
Mean Fetch Time (Threaded):    0.0043387413 seconds/item

======= SUMMARY =======
Data Items:      3
Fetch Attempts:  1
Mean Fetch Time (Sequential):  0.0077099800 seconds/item
Mean Fetch Time (Threaded):    0.0101859570 seconds/item

======= SUMMARY =======
Data Items:      4
Fetch Attempts:  1
Mean Fetch Time (Sequential):  0.0027137399 seconds/item
Mean Fetch Time (Threaded):    0.0028825402 seconds/item

======= SUMMARY =======
Data Items:      5
Fetch Attempts:  1
Mean Fetch Time (Sequential):  0.0063509941 seconds/item
Mean Fetch Time (Threaded):    0.0098129749 seconds/item

ntninja commented 6 years ago

Hmm… Looking at your benchmarking script it immediately strikes me that you create/destroy a thread for each request. That definitely decreses performance, try using concurrent.futures.ThreadPoolExecutor for making these calls instead (keep the thread pool alive during the entire lifetime of the application).

If you're ready for real async please try my async branch; I'd really like to have some feedback on this.

Minimal example code would be something like this (Python 3.5+, untested):

import asyncio
import sys

import ipfsapi

async def do_test(client):
    hashes = await asyncio.gather(
        client.add_bytes(b"bla1"),
        client.add_bytes(b"bla2"),
        client.add_bytes(b"bla3")
    )
    contents = await asyncio.gather(
        client.cat(hashes[0].result()),
        client.cat(hashes[1].result()),
        client.cat(hashes[2].result())
    )

async def amain(argv=sys.argv[1:], loop=asyncio.get_event_loop()):
    async with ipfsapi.connect_async(session=True, loop=loop) as client:
        do_test(client)
loop = asyncio.get_event_loop()
loop.run_until_complete(amain(loop=loop))

ibnesayeed commented 6 years ago

Hmm… Looking at your benchmarking script it immediately strikes me that you create/destroy a thread for each request. That definitely decreses performance, try using concurrent.futures.ThreadPoolExecutor for making these calls instead (keep the thread pool alive during the entire lifetime of the application).

While in our actual application we can certainly use persistent thread pools, in the benchmarking script however we are reporting timing in two different ways. If you look at the sample output in the benchmark repo's README file, attempts on individual items are reporting elapsed time only for the cat() call, while the summary of each attempt as well as the overall summary reports the overall execution time which includes any thread overhead. It is surprising to see that cat calls executed in individual threads are taking roughly as much time (sometimes even more) as all the sequential calls combined.

If you're ready for real async please try my async branch; I'd really like to have some feedback on this.

This could be a really good approach, however it will enforce us to use Py3 in IPWB (currently it is only Py2 supported). I am personally in favor of just supporting Py3 and not Py2 (https://github.com/oduwsdl/ipwb/issues/51#issuecomment-350320934), but @machawk1 insists on continuing to support Py2.

machawk1 commented 6 years ago

Aside: if we can achieve the anticipated speedup using async/threading, it might be worth supporting only Python 3 in ipwb.

ipfs-shipyard / py-ipfs-http-client

Concurrent fetches are slower than sequential #131