lithops-cloud / lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
http://lithops.cloud
Apache License 2.0
313 stars 103 forks source link

ConnectTimeoutError when retrieving results from ibm-cos #271

Closed LachlanStuart closed 4 years ago

LachlanStuart commented 4 years ago

When running our code from my home internet connection I get intermittent ConnectTimeoutErrors when calling pw.get_results(). Usually this happens when there are large numbers of actions or the action results are large.

Here is a stack trace of the error: https://gist.github.com/LachlanStuart/86dcde8502f3fab713e1cddbb0b26a36 This job invoked 285 parallel activations, with a total of 1.2GB of data returned. Sometimes this job succeeds, other times it fails.

I believe the issue is caused by this code: https://github.com/pywren/pywren-ibm-cloud/blob/master/pywren_ibm_cloud/storage/backends/ibm_cos/ibm_cos.py#L68-L86 which sets timeouts for the storage backend. These aggressive timeouts make sense for code that runs in Actions, but it doesn't make sense for the host computer, which may have a lower-bandwidth or higher-latency network connection.

JosepSampe commented 4 years ago

I increased the connection timeout to 3 seconds in #270 . Let's see if this is enough