mtzgroup / chemcloud-client

Python client for TeraChem Cloud
MIT License
11 stars 3 forks source link

[FEATURE] Handle arbitrarily long lists of inputs. #44

Open coltonbh opened 1 year ago

coltonbh commented 1 year ago

Multi-thread (or use asyncio) the submission and retror array

Use something like this:

import concurrent.futures

def compute(...):
     ...

with concurrent.futures.ThreadPoolExecutor() as executor:
    results = list(executor.map(compute, inputs))

Is there really any reason to even use Celery groups If I implement this? Using them can reduce the number of HTTP calls quite dramatically, but if I multi-thread the submission and collection of results--perhaps the performance gains aren't that big of a deal and then I have a single level of abstraction to worry about--multi-threading submission and collection of results--instead of nesting groups within that multi-threading.

Pros: I'd never get timeout issues when collecting results because the group ended up being so large.

Cons: Collection of results will be slower, as it IS faster to collect 100 results (current default) in a single request.

EDIT: Chatted with Ethan. He has special code to handle group size issues when a group has too much data to download before timing out. This indicates to me that the batch size thing should be dispensed with entirely. Better to get your data with a few seconds of delay on a large batch submission than to have your results held hostage on the server because the group is too large to be downloaded and you have to resubmit with a smaller batch size in order to get your results back.

For FutureResultGroup we could use concurrent.futures.as_completed(...) to collect results as they complete:

for result in future_result.as_completed():
    # do something with result

Could also have in order as-completed:

for result in future_result.collect() # or some better method name
    # do something with result

Could also "stream" results with

for result in future_results.as_completed():
    # Process results in real-time