Closed foster999 closed 11 months ago
Hi @foster999 . Found that the lithops joblib backend was not compatible anymore with latest versions of joblib. I submitted a newer version of the backend. Can you test it with master branch?
Thanks @JosepSampe, the example above and more complex examples are working well with master version
Logging now seems to be much more quieter and doesn't report on the separate processes. With the code above I get a single line: 2023-10-18 14:45:26,200 [INFO] config.py:141 -- Lithops v3.0.2.dev0 - Python3.11
Edit: I can see the other two lines in the output above have been moved to DEBUG. But no information is shown about the separate processes when logger is set to DEBUG either. While previously lines were logged for each process to show their state changes.
Hi @foster999 . Found that the lithops joblib backend was not compatible anymore with latest versions of joblib. I submitted a newer version of the backend. Can you test it with master branch?
Great! thanks for testing it with more complex scenarios
Logging now seems to be much more quieter and doesn't report on the separate processes. With the code above I get a single line:
2023-10-18 14:45:26,200 [INFO] config.py:141 -- Lithops v3.0.2.dev0 - Python3.11
Edit: I can see the other two lines in the output above have been moved to DEBUG. But no information is shown about the separate processes when logger is set to DEBUG either. While previously lines were logged for each process to show their state changes.
Can you provide an example of what you expect? Please note that the actual function execution takes place in the cloud, so locally, you can only see Lithops logs. If you wish to view function logs, you should run 'lithops logs poll' in another terminal.
I'm currently executing and storing locally, rather than on cloud. I think I was just surprised that fewer logs were produced. That said it doesn't look like any of the missing info does actually relate to the parallel processes - before the change the following logs were produced:
2023-10-18 16:30:06,009 [INFO] config.py:141 -- Lithops v3.0.1 - Python3.11
2023-10-18 16:30:06,009 [DEBUG] config.py:103 -- Loading configuration from .lithops_config
2023-10-18 16:30:06,018 [DEBUG] config.py:176 -- Loading compute backend module: localhost
2023-10-18 16:30:06,018 [DEBUG] config.py:235 -- Loading Storage backend module: localhost
2023-10-18 16:30:06,018 [DEBUG] localhost.py:36 -- Creating Localhost storage client
2023-10-18 16:30:06,018 [INFO] localhost.py:39 -- Localhost storage client created
2023-10-18 16:30:06,018 [DEBUG] localhost.py:49 -- Creating Localhost compute client
2023-10-18 16:30:06,018 [INFO] localhost.py:58 -- Localhost compute client created
2023-10-18 16:30:06,018 [DEBUG] invokers.py:93 -- ExecutorID f7157d-0 - Invoker initialized. Max workers: 1
2023-10-18 16:30:06,018 [DEBUG] executors.py:145 -- Function executor for localhost created with ID: f7157d-0
Now on master branch version:
2023-10-18 16:31:44,256 [INFO] config.py:141 -- Lithops v3.0.2.dev0 - Python3.11
2023-10-18 16:31:44,256 [DEBUG] config.py:103 -- Loading configuration from .lithops_config
2023-10-18 16:31:44,264 [DEBUG] config.py:176 -- Loading compute backend module: localhost
2023-10-18 16:31:44,264 [DEBUG] config.py:235 -- Loading Storage backend module: localhost
I don't think any further change is required, so happy for you to close this. Thanks for the tip on lithops logs poll
Found that cpu_count
was not properly calcluated for the localhost backend. I fixed it in #1176
Can you test again with master branch?
The logs haven't changed from the master version above, so can't see any details about the workers:
2023-10-19 09:07:26,850 [INFO] config.py:141 -- Lithops v3.0.2.dev0 - Python3.11
2023-10-19 09:07:26,850 [DEBUG] config.py:103 -- Loading configuration from .lithops_config
2023-10-19 09:07:26,861 [DEBUG] config.py:176 -- Loading compute backend module: localhost
2023-10-19 09:07:26,861 [DEBUG] config.py:235 -- Loading Storage backend module: localhost
Can you try running the command lithops clean -b localhost
, and then running again the sklearn script?
No change in the sklearn script log output after cleaning I'm afraid. The clean shows:
2023-10-19 10:22:02,171 [INFO] cli.py:85 -- Cleaning all Lithops information
2023-10-19 10:22:02,171 [INFO] config.py:141 -- Lithops v3.0.2.dev0 - Python3.11
2023-10-19 10:22:02,184 [INFO] localhost.py:39 -- Localhost storage client created
2023-10-19 10:22:02,184 [INFO] localhost.py:58 -- Localhost compute client created
2023-10-19 10:22:02,184 [INFO] utils.py:83 -- Deleting objects from bucket 'storage' and prefix 'lithops.runtimes/localhost'
2023-10-19 10:22:03,189 [INFO] utils.py:93 -- Finished deleting objects, total found: 2
2023-10-19 10:22:03,189 [INFO] utils.py:83 -- Deleting objects from bucket 'storage' and prefix 'lithops.jobs'
2023-10-19 10:22:04,198 [INFO] utils.py:93 -- Finished deleting objects, total found: 4
2023-10-19 10:22:04,701 [INFO] cli.py:119 -- All Lithops data cleaned
The clean
command only deletes non-necessary lithops temporary data. Sometimes it might be good to run it in the case you see inconsistences.
Can you place this script in tha same directory where you have the sklearn script, run it, and copy/paste all the output here?
from lithops.multiprocessing import Pool, cpu_count
from lithops.utils import setup_lithops_logger
setup_lithops_logger("DEBUG")
def hello(name):
return 'Hello {}!'.format(name)
if __name__ == '__main__':
print(cpu_count())
with Pool() as pool:
res = pool.apply(hello, ('World', ))
print(res)
I get:
2023-10-19 11:16:43,672 [DEBUG] serialize.py:101 -- Modules to transmit: None
2023-10-19 11:16:43,672 [DEBUG] job.py:268 -- ExecutorID 5585cb-0 | JobID A000 - Uploading function and modules to the storage backend
2023-10-19 11:16:43,672 [DEBUG] job.py:294 -- ExecutorID 5585cb-0 | JobID A000 - Uploading data to the storage backend
2023-10-19 11:16:43,672 [INFO] invokers.py:171 -- ExecutorID 5585cb-0 | JobID A000 - Starting function invocation: cloud_process_wrapper() - Total: 1 activations
2023-10-19 11:16:43,672 [DEBUG] invokers.py:176 -- ExecutorID 5585cb-0 | JobID A000 - Worker processes: 10 - Chunksize: 10
2023-10-19 11:16:43,672 [DEBUG] localhost.py:136 -- ExecutorID 5585cb-0 | JobID A000 - Putting job into localhost queue
2023-10-19 11:16:43,672 [DEBUG] localhost.py:72 -- Staring localhost job manager
2023-10-19 11:16:43,673 [DEBUG] invokers.py:252 -- ExecutorID 5585cb-0 | JobID A000 - Job invoked (0.000s) - Activation ID: 5585cb-0-A000
2023-10-19 11:16:43,673 [DEBUG] localhost.py:386 -- ExecutorID 5585cb-0 | JobID A000 - Running 1 activations in the localhost worker
2023-10-19 11:16:43,673 [INFO] invokers.py:207 -- ExecutorID 5585cb-0 | JobID A000 - View execution logs at /private/var/folders/t4/hdpq5bd51tjfws2mq4kj7gqh0000gn/T/lithops-davidfoster/logs/5585cb-0-A000.log
2023-10-19 11:16:43,673 [DEBUG] monitor.py:381 -- ExecutorID 5585cb-0 - Starting Storage job monitor
2023-10-19 11:16:43,673 [INFO] wait.py:98 -- ExecutorID 5585cb-0 - Getting results from 1 function activations
2023-10-19 11:16:43,912 [DEBUG] localhost.py:85 -- ExecutorID 5585cb-0 | JobID A000 - Execution finished
2023-10-19 11:16:43,912 [DEBUG] localhost.py:90 -- Localhost job manager stopped
2023-10-19 11:16:45,682 [DEBUG] monitor.py:127 -- ExecutorID 5585cb-0 - Pending: 0 - Running: 0 - Done: 1
2023-10-19 11:16:45,682 [DEBUG] monitor.py:409 -- ExecutorID 5585cb-0 - Storage job monitor finished
2023-10-19 11:16:45,684 [DEBUG] future.py:270 -- ExecutorID 5585cb-0 | JobID A000 - Got status from call 00000 - Activation ID: ec9320ee8f74 - Time: 0.03 seconds
2023-10-19 11:16:45,684 [DEBUG] future.py:290 -- ExecutorID 5585cb-0 | JobID A000 - Got output from call 00000 - Activation ID: ec9320ee8f74
2023-10-19 11:16:45,684 [INFO] executors.py:593 -- ExecutorID 5585cb-0 - Cleaning temporary data
2023-10-19 11:16:45,690 [DEBUG] executors.py:503 -- ExecutorID 5585cb-0 - Finished getting results
Hello World!
2023-10-19 11:16:45,691 [DEBUG] pool.py:202 -- terminating pool
I thing there are mising logs, you should see all of this:
2023-10-19 12:09:22,862 [INFO] config.py:141 -- Lithops v3.0.2.dev0 - Python3.10
2023-10-19 12:09:22,863 [DEBUG] config.py:103 -- Loading configuration from /home/josep/.lithops/config
2023-10-19 12:09:22,929 [DEBUG] config.py:176 -- Loading compute backend module: localhost
2023-10-19 12:09:22,929 [DEBUG] config.py:235 -- Loading Storage backend module: localhost
8
2023-10-19 12:09:22,934 [INFO] config.py:141 -- Lithops v3.0.2.dev0 - Python3.10
2023-10-19 12:09:22,934 [DEBUG] config.py:103 -- Loading configuration from /home/josep/.lithops/config
2023-10-19 12:09:22,944 [DEBUG] config.py:176 -- Loading compute backend module: localhost
2023-10-19 12:09:22,944 [DEBUG] config.py:235 -- Loading Storage backend module: localhost
2023-10-19 12:09:22,944 [DEBUG] localhost.py:36 -- Creating Localhost storage client
2023-10-19 12:09:22,944 [INFO] localhost.py:39 -- Localhost storage client created
2023-10-19 12:09:22,944 [DEBUG] localhost.py:49 -- Creating Localhost compute client
2023-10-19 12:09:22,944 [INFO] localhost.py:58 -- Localhost compute client created
2023-10-19 12:09:22,944 [DEBUG] invokers.py:93 -- ExecutorID 003544-0 - Invoker initialized. Max workers: 1
2023-10-19 12:09:22,944 [DEBUG] executors.py:145 -- Function executor for localhost created with ID: 003544-0
2023-10-19 12:09:22,944 [INFO] invokers.py:107 -- ExecutorID 003544-0 | JobID A000 - Selected Runtime: python3.10
2023-10-19 12:09:22,947 [DEBUG] storage.py:431 -- Runtime metadata found in local disk cache
2023-10-19 12:09:22,948 [DEBUG] job.py:234 -- ExecutorID 003544-0 | JobID A000 - Serializing function and data
2023-10-19 12:09:22,949 [DEBUG] module_dependency.py:66 -- Queuing module 'os'
2023-10-19 12:09:22,950 [DEBUG] module_dependency.py:110 -- Module 'os' is to be ignored, skipping
2023-10-19 12:09:22,950 [DEBUG] module_dependency.py:66 -- Queuing module 'lithops.multiprocessing.util'
2023-10-19 12:09:22,950 [DEBUG] module_dependency.py:110 -- Module 'lithops' is to be ignored, skipping
2023-10-19 12:09:22,950 [DEBUG] module_dependency.py:66 -- Queuing module 'traceback'
2023-10-19 12:09:22,950 [DEBUG] module_dependency.py:110 -- Module 'traceback' is to be ignored, skipping
2023-10-19 12:09:22,950 [DEBUG] module_dependency.py:66 -- Queuing module 'lithops.multiprocessing.process'
2023-10-19 12:09:22,950 [DEBUG] module_dependency.py:110 -- Module 'lithops' is to be ignored, skipping
2023-10-19 12:09:22,950 [DEBUG] serialize.py:84 -- Referenced modules: /home/josep/dev-worksapce/personal/lithops/lithops/multiprocessing/util.py, /home/josep/dev-worksapce/personal/lithops/lithops/multiprocessing/process.py, /usr/lib/python3.10/os.py, /usr/lib/python3.10/traceback.py
2023-10-19 12:09:22,950 [DEBUG] serialize.py:101 -- Modules to transmit: None
2023-10-19 12:09:22,951 [DEBUG] job.py:268 -- ExecutorID 003544-0 | JobID A000 - Uploading function and modules to the storage backend
2023-10-19 12:09:22,952 [DEBUG] job.py:294 -- ExecutorID 003544-0 | JobID A000 - Uploading data to the storage backend
2023-10-19 12:09:22,952 [INFO] invokers.py:171 -- ExecutorID 003544-0 | JobID A000 - Starting function invocation: cloud_process_wrapper() - Total: 1 activations
2023-10-19 12:09:22,952 [DEBUG] invokers.py:176 -- ExecutorID 003544-0 | JobID A000 - Worker processes: 8 - Chunksize: 8
2023-10-19 12:09:22,952 [DEBUG] localhost.py:136 -- ExecutorID 003544-0 | JobID A000 - Putting job into localhost queue
2023-10-19 12:09:22,952 [DEBUG] localhost.py:72 -- Staring localhost job manager
2023-10-19 12:09:22,953 [DEBUG] localhost.py:357 -- Starting Default Environment for /usr/bin/python3.10
2023-10-19 12:09:22,953 [DEBUG] localhost.py:361 -- Setting up Default environment
2023-10-19 12:09:23,298 [DEBUG] invokers.py:252 -- ExecutorID 003544-0 | JobID A000 - Job invoked (0.346s) - Activation ID: 003544-0-A000
2023-10-19 12:09:23,298 [INFO] invokers.py:207 -- ExecutorID 003544-0 | JobID A000 - View execution logs at /tmp/lithops-josep/logs/003544-0-A000.log
2023-10-19 12:09:23,299 [DEBUG] localhost.py:383 -- ExecutorID 003544-0 | JobID A000 - Running 1 activations in the localhost worker
2023-10-19 12:09:23,299 [DEBUG] monitor.py:381 -- ExecutorID 003544-0 - Starting Storage job monitor
2023-10-19 12:09:23,301 [INFO] wait.py:98 -- ExecutorID 003544-0 - Getting results from 1 function activations
2023-10-19 12:09:24,169 [DEBUG] localhost.py:85 -- ExecutorID 003544-0 | JobID A000 - Execution finished
2023-10-19 12:09:24,169 [DEBUG] localhost.py:90 -- Localhost job manager stopped
2023-10-19 12:09:25,352 [DEBUG] monitor.py:127 -- ExecutorID 003544-0 - Pending: 0 - Running: 0 - Done: 1
2023-10-19 12:09:25,352 [DEBUG] monitor.py:409 -- ExecutorID 003544-0 - Storage job monitor finished
2023-10-19 12:09:26,312 [DEBUG] future.py:270 -- ExecutorID 003544-0 | JobID A000 - Got status from call 00000 - Activation ID: c648eeeec4ef - Time: 0.09 seconds
2023-10-19 12:09:26,312 [DEBUG] future.py:290 -- ExecutorID 003544-0 | JobID A000 - Got output from call 00000 - Activation ID: c648eeeec4ef
2023-10-19 12:09:26,313 [INFO] executors.py:593 -- ExecutorID 003544-0 - Cleaning temporary data
2023-10-19 12:09:26,315 [DEBUG] executors.py:503 -- ExecutorID 003544-0 - Finished getting results
Hello World!
2023-10-19 12:09:26,315 [DEBUG] pool.py:202 -- terminating pool
Sorry, you're right - missed the first part:
2023-10-19 11:24:49,301 [INFO] config.py:141 -- Lithops v3.0.2.dev0 - Python3.11
2023-10-19 11:24:49,301 [DEBUG] config.py:103 -- Loading configuration from /Users/davidfoster/repos/data_engineering_model/.lithops_config
2023-10-19 11:24:49,311 [DEBUG] config.py:176 -- Loading compute backend module: localhost
2023-10-19 11:24:49,311 [DEBUG] config.py:235 -- Loading Storage backend module: localhost
1
2023-10-19 11:24:49,312 [INFO] config.py:141 -- Lithops v3.0.2.dev0 - Python3.11
2023-10-19 11:24:49,312 [DEBUG] config.py:103 -- Loading configuration from .lithops_config
2023-10-19 11:24:49,313 [DEBUG] config.py:176 -- Loading compute backend module: localhost
2023-10-19 11:24:49,313 [DEBUG] config.py:235 -- Loading Storage backend module: localhost
2023-10-19 11:24:49,313 [DEBUG] localhost.py:36 -- Creating Localhost storage client
2023-10-19 11:24:49,313 [INFO] localhost.py:39 -- Localhost storage client created
2023-10-19 11:24:49,313 [DEBUG] localhost.py:49 -- Creating Localhost compute client
2023-10-19 11:24:49,313 [INFO] localhost.py:58 -- Localhost compute client created
2023-10-19 11:24:49,313 [DEBUG] invokers.py:93 -- ExecutorID dc65f6-0 - Invoker initialized. Max workers: 1
2023-10-19 11:24:49,313 [DEBUG] executors.py:145 -- Function executor for localhost created with ID: dc65f6-0
2023-10-19 11:24:49,313 [INFO] invokers.py:107 -- ExecutorID dc65f6-0 | JobID A000 - Selected Runtime: python
2023-10-19 11:24:49,313 [DEBUG] storage.py:431 -- Runtime metadata found in local disk cache
2023-10-19 11:24:49,314 [DEBUG] job.py:234 -- ExecutorID dc65f6-0 | JobID A000 - Serializing function and data
2023-10-19 11:24:49,315 [DEBUG] module_dependency.py:66 -- Queuing module 'os'
2023-10-19 11:24:49,315 [DEBUG] module_dependency.py:110 -- Module 'os' is to be ignored, skipping
2023-10-19 11:24:49,315 [DEBUG] module_dependency.py:66 -- Queuing module 'traceback'
2023-10-19 11:24:49,315 [DEBUG] module_dependency.py:110 -- Module 'traceback' is to be ignored, skipping
2023-10-19 11:24:49,315 [DEBUG] module_dependency.py:66 -- Queuing module 'lithops.multiprocessing.process'
2023-10-19 11:24:49,315 [DEBUG] module_dependency.py:110 -- Module 'lithops' is to be ignored, skipping
2023-10-19 11:24:49,315 [DEBUG] module_dependency.py:66 -- Queuing module 'lithops.multiprocessing.util'
2023-10-19 11:24:49,315 [DEBUG] module_dependency.py:110 -- Module 'lithops' is to be ignored, skipping
2023-10-19 11:24:49,315 [DEBUG] serialize.py:84 -- Referenced modules: venv/lib/python3.11/site-packages/lithops/multiprocessing/process.py, venv/lib/python3.11/site-packages/lithops/multiprocessing/util.py, .pyenv/versions/3.11.4/lib/python3.11/traceback.py, frozen
2023-10-19 11:24:49,315 [DEBUG] serialize.py:101 -- Modules to transmit: None
2023-10-19 11:24:49,315 [DEBUG] job.py:268 -- ExecutorID dc65f6-0 | JobID A000 - Uploading function and modules to the storage backend
2023-10-19 11:24:49,315 [DEBUG] job.py:294 -- ExecutorID dc65f6-0 | JobID A000 - Uploading data to the storage backend
2023-10-19 11:24:49,315 [INFO] invokers.py:171 -- ExecutorID dc65f6-0 | JobID A000 - Starting function invocation: cloud_process_wrapper() - Total: 1 activations
2023-10-19 11:24:49,315 [DEBUG] invokers.py:176 -- ExecutorID dc65f6-0 | JobID A000 - Worker processes: 10 - Chunksize: 10
2023-10-19 11:24:49,315 [DEBUG] localhost.py:136 -- ExecutorID dc65f6-0 | JobID A000 - Putting job into localhost queue
2023-10-19 11:24:49,315 [DEBUG] localhost.py:72 -- Staring localhost job manager
2023-10-19 11:24:49,315 [DEBUG] localhost.py:360 -- Starting Default Environment for venv/bin/python
2023-10-19 11:24:49,315 [DEBUG] localhost.py:364 -- Setting up Default environment
2023-10-19 11:24:49,411 [DEBUG] invokers.py:252 -- ExecutorID dc65f6-0 | JobID A000 - Job invoked (0.096s) - Activation ID: dc65f6-0-A000
2023-10-19 11:24:49,411 [DEBUG] localhost.py:386 -- ExecutorID dc65f6-0 | JobID A000 - Running 1 activations in the localhost worker
2023-10-19 11:24:49,412 [INFO] invokers.py:207 -- ExecutorID dc65f6-0 | JobID A000 - View execution logs at /private/var/folders/t4/hdpq5bd51tjfws2mq4kj7gqh0000gn/T/lithops-davidfoster/logs/dc65f6-0-A000.log
2023-10-19 11:24:49,412 [DEBUG] monitor.py:381 -- ExecutorID dc65f6-0 - Starting Storage job monitor
2023-10-19 11:24:49,412 [INFO] wait.py:98 -- ExecutorID dc65f6-0 - Getting results from 1 function activations
2023-10-19 11:24:49,689 [DEBUG] localhost.py:85 -- ExecutorID dc65f6-0 | JobID A000 - Execution finished
2023-10-19 11:24:49,690 [DEBUG] localhost.py:90 -- Localhost job manager stopped
2023-10-19 11:24:51,419 [DEBUG] monitor.py:127 -- ExecutorID dc65f6-0 - Pending: 0 - Running: 0 - Done: 1
2023-10-19 11:24:51,419 [DEBUG] monitor.py:409 -- ExecutorID dc65f6-0 - Storage job monitor finished
2023-10-19 11:24:51,421 [DEBUG] future.py:270 -- ExecutorID dc65f6-0 | JobID A000 - Got status from call 00000 - Activation ID: b4e41eb70766 - Time: 0.03 seconds
2023-10-19 11:24:51,421 [DEBUG] future.py:290 -- ExecutorID dc65f6-0 | JobID A000 - Got output from call 00000 - Activation ID: b4e41eb70766
2023-10-19 11:24:51,422 [INFO] executors.py:593 -- ExecutorID dc65f6-0 - Cleaning temporary data
2023-10-19 11:24:51,427 [DEBUG] executors.py:503 -- ExecutorID dc65f6-0 - Finished getting results
Hello World!
2023-10-19 11:24:51,428 [DEBUG] pool.py:202 -- terminating pool
Thanks, as i can see in the logs:
You have 10 workes process (CPUs)
2023-10-19 11:24:49,315 [DEBUG] invokers.py:176 -- ExecutorID dc65f6-0 | JobID A000 - Worker processes: 10 - Chunksize: 10
while the output of CPU_COUNT is 1
2023-10-19 11:24:49,301 [INFO] config.py:141 -- Lithops v3.0.2.dev0 - Python3.11
2023-10-19 11:24:49,301 [DEBUG] config.py:103 -- Loading configuration from /Users/davidfoster/repos/data_engineering_model/.lithops_config
2023-10-19 11:24:49,311 [DEBUG] config.py:176 -- Loading compute backend module: localhost
2023-10-19 11:24:49,311 [DEBUG] config.py:235 -- Loading Storage backend module: localhost
1 **<--- CPU_COUNT**
CPU_COUNT should be 10, so my feeling is that you master branch is not updated to the latest changes
Thanks for this, I've now uninstalled and reinstalled with pip install git+https://github.com/lithops-cloud/lithops.git@master
CPU_COUNT is showing 10 now and running the original example shows much more involved logs now. Apologies, the install from git must have not upgraded the last time as the version number was the same
When using lithops joblib backend, a grid search with the RandomForestClassifier causes an error from joblib. The error complains that the system doesn't support forking, but MacOS does. Running a very close example with either a different classifier or the "loky" (default) joblib backend runs without issue.
Please see sklearn issue for full details - referred here as likely source of issue.
Reproducible example:
Raises this error: