lithops-cloud / lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
http://lithops.cloud
Apache License 2.0
317 stars 105 forks source link

When using lithops multiprocessing it runs forever #1125

Closed oanamoc closed 1 year ago

oanamoc commented 1 year ago

I have a simple code that counts words and uses the multiprocessing.Pool function. I read that lithops has the same thing so I changed the first line of the code to: from lithops.multiprocessing import Pool And then I left the code as it was, using the Pool function. The original code works (I checked) and lithops works. When running the new code it gets stuck on the last line and will not continue. Is there some issue in my aws account with the lithops-default-runtime-v39?

oanamoc@pc-10-200 lithops % cd /Users/oanamoc/Desktop/Work/lithops ; /usr/bin/env /usr/local/bin/python3 /Users/oanamoc/.vscode/extensions/ms-p ython.python-2023.10.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher 64688 -- /Users/oanamoc/Desktop/Work/lithops/task1.py 2023-07-07 11:46:53,503 [INFO] config.py:141 -- Lithops v2.9.1.dev0 2023-07-07 11:46:53,838 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-3 2023-07-07 11:46:54,770 [INFO] aws_lambda.py:94 -- AWS Lambda client created - Region: eu-west-3 2023-07-07 11:46:55,118 [INFO] invokers.py:108 -- ExecutorID 3c1776-0 | JobID M000 - Selected Runtime: lithops-default-runtime-v39 - 256MB

JosepSampe commented 1 year ago

@oanamoc Does this example of using a multiprocessing.pool work for you? https://github.com/lithops-cloud/lithops/blob/master/examples/multiprocessing/pool.py

oanamoc commented 1 year ago

No, it says this: Timed out! For the moment, the pool remains available for more work Now the pool is closed and no longer available

JosepSampe commented 1 year ago

Can you activate Debug logs and paste them here? setup_lithops_logger(logging.DEBUG)

JosepSampe commented 1 year ago

Well, the previous example in https://github.com/lithops-cloud/lithops/blob/master/examples/multiprocessing/pool.py is not a good example since with the "timeout" result it seems that something went wrong, but in fact is a good execution in this particular case.

To test the multiprocessing.Pool you just need to try this:

from lithops.multiprocessing import Pool
from lithops.utils import setup_lithops_logger
import logging

setup_lithops_logger(logging.DEBUG)

def square(x):
    return x * x

if __name__ == '__main__':
    with Pool() as pool:
        res = pool.map(square, [1, 2, 3, 4, 5])
        print(res)
oanamoc commented 1 year ago

The last one works. [1, 4, 9, 16, 25]

oanamoc commented 1 year ago

And about the debugging, it just says this. oanamoc@pc-10-200 lithops % setup_lithops_logger(logging.DEBUG) zsh: number expected

JosepSampe commented 1 year ago

Is your task1.py using a pipe or any other shared resource?

The setup_lithops_logger(logging.DEBUG) must be set inside the python file, as in the last example

oanamoc commented 1 year ago

I believe it doesn't. I will provide my code too, perhaps it might help.

from lithops.multiprocessing import Pool import time from lithops.utils import setup_lithops_logger import logging

setup_lithops_logger(logging.DEBUG)

def count_words(line): count = 0 line = line.rstrip() pieces = line.split() for initial_word in pieces: word = ''.join(c for c in initial_word if c.isalpha()) if word: # Skip empty words count = count + 1 return count

if name == 'main':

fname = 'notsobigtextfile_small.txt'
try:
    fhand = open (fname)
except:
    print ('File cannot be opened:', fname)
    quit()

start = time.perf_counter()

word_number = 0

with Pool() as p:
    line_numbers = p.map(count_words, fhand)

word_number = sum(line_numbers)

print ('Word Count: ', word_number)

finish = time.perf_counter()

print(f'Finished in {round(finish - start, 2)} second(s)')

The terminal stopped 5 minutes ago on the last line:

oanamoc@pc-10-200 lithops % cd /Users/oanamoc/Desktop/Work/lithops ; /usr/bin/env /usr/local/bin/python3 /Users/oanamoc/.vscode/extensions/ms-p ython.python-2023.10.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher 63759 -- /Users/oanamoc/Desktop/Work/lithops/task1.py 2023-07-10 12:44:26,807 [INFO] config.py:141 -- Lithops v2.9.1.dev0 2023-07-10 12:44:26,808 [DEBUG] config.py:103 -- Loading configuration from /Users/oanamoc/Desktop/Work/lithops/.lithops_config 2023-07-10 12:44:26,860 [DEBUG] config.py:193 -- Loading Serverless backend module: aws_lambda 2023-07-10 12:44:27,000 [DEBUG] config.py:235 -- Loading Storage backend module: aws_s3 2023-07-10 12:44:27,005 [DEBUG] aws_s3.py:37 -- Creating S3 client 2023-07-10 12:44:27,278 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-3 2023-07-10 12:44:27,643 [DEBUG] aws_lambda.py:52 -- Creating AWS Lambda client 2023-07-10 12:44:27,644 [DEBUG] aws_lambda.py:65 -- Creating Boto3 AWS Session and Lambda Client 2023-07-10 12:44:28,493 [INFO] aws_lambda.py:94 -- AWS Lambda client created - Region: eu-west-3 2023-07-10 12:44:28,497 [DEBUG] invokers.py:94 -- ExecutorID 6124c2-0 - Invoker initialized. Max workers: 1000 2023-07-10 12:44:28,497 [DEBUG] invokers.py:293 -- ExecutorID 6124c2-0 - Serverless invoker created 2023-07-10 12:44:28,497 [DEBUG] executors.py:167 -- Function executor for aws_lambda created with ID: 6124c2-0 2023-07-10 12:44:28,892 [INFO] invokers.py:108 -- ExecutorID 6124c2-0 | JobID M000 - Selected Runtime: lithops-default-runtime-v39 - 256MB 2023-07-10 12:44:28,892 [DEBUG] storage.py:431 -- Runtime metadata found in local disk cache 2023-07-10 12:45:11,738 [DEBUG] job.py:234 -- ExecutorID 6124c2-0 | JobID M000 - Serializing function and data

JosepSampe commented 1 year ago

The example you posted works with master branch:

from lithops.multiprocessing import Pool
import time
from lithops.utils import setup_lithops_logger
import logging

setup_lithops_logger(logging.DEBUG)

def count_words(line):
    count = 0
    line = line.rstrip()
    pieces = line.split()
    for initial_word in pieces:
        word = ''.join(c for c in initial_word if c.isalpha())
        if word:  # Skip empty words
            count = count + 1
    return count

if __name__ == "__main__":
    fname = 'notsobigtextfile_small.txt'

    try:
        fhand = open(fname)
    except Exception:
        print('File cannot be opened:', fname)
        quit()

    start = time.perf_counter()

    word_number = 0
    with Pool() as p:
        line_numbers = p.map(count_words, fhand)

    word_number = sum(line_numbers)
    print('Word Count: ', word_number)
    finish = time.perf_counter()
    print(f'Finished in {round(finish - start, 2)} second(s)')

Can you try updating you local installation and try again:

pip install -U git+https://github.com/lithops-cloud/lithops.git@master
lithops clean -b aws_lambda -s aws_s3
oanamoc commented 1 year ago

In the end the program did finish. It took 33 minutes. I repasted the last line from my previous message and then what comes next: 2023-07-10 12:45:11,738 [DEBUG] job.py:234 -- ExecutorID 6124c2-0 | JobID M000 - Serializing function and data 2023-07-10 13:18:11,457 [DEBUG] module_dependency.py:66 -- Queuing module 'lithops.multiprocessing.process' 2023-07-10 13:18:11,457 [DEBUG] module_dependency.py:110 -- Module 'lithops' is to be ignored, skipping 2023-07-10 13:18:11,457 [DEBUG] module_dependency.py:66 -- Queuing module 'lithops.multiprocessing.util' 2023-07-10 13:18:11,457 [DEBUG] module_dependency.py:110 -- Module 'lithops' is to be ignored, skipping 2023-07-10 13:18:11,457 [DEBUG] module_dependency.py:66 -- Queuing module 'os' 2023-07-10 13:18:11,457 [DEBUG] module_dependency.py:110 -- Module 'os' is to be ignored, skipping 2023-07-10 13:18:11,457 [DEBUG] module_dependency.py:66 -- Queuing module 'traceback' 2023-07-10 13:18:11,457 [DEBUG] module_dependency.py:110 -- Module 'traceback' is to be ignored, skipping 2023-07-10 13:18:11,457 [DEBUG] serialize.py:84 -- Referenced modules: /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/traceback.py, /Users/oanamoc/Library/Python/3.9/lib/python/site-packages/lithops/multiprocessing/util.py, /Users/oanamoc/Library/Python/3.9/lib/python/site-packages/lithops/multiprocessing/process.py, /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/os.py 2023-07-10 13:18:11,458 [DEBUG] serialize.py:101 -- Modules to transmit: None 2023-07-10 13:18:12,869 [DEBUG] pool.py:202 -- terminating pool

oanamoc commented 1 year ago

I made the update and I'm trying again right now.

JosepSampe commented 1 year ago

The serializing function and data step should finish really quick. This is my trace:

2023-07-10 14:00:14,888 [DEBUG] storage.py:431 -- Runtime metadata found in local disk cache
2023-07-10 14:00:14,889 [DEBUG] job.py:234 -- ExecutorID 0dcdab-0 | JobID M000 - Serializing function and data
2023-07-10 14:00:14,895 [DEBUG] module_dependency.py:66 -- Queuing module 'lithops.multiprocessing.util'
2023-07-10 14:00:14,895 [DEBUG] module_dependency.py:110 -- Module 'lithops' is to be ignored, skipping
2023-07-10 14:00:14,895 [DEBUG] module_dependency.py:66 -- Queuing module 'os'
2023-07-10 14:00:14,895 [DEBUG] module_dependency.py:110 -- Module 'os' is to be ignored, skipping
2023-07-10 14:00:14,895 [DEBUG] module_dependency.py:66 -- Queuing module 'traceback'
2023-07-10 14:00:14,895 [DEBUG] module_dependency.py:110 -- Module 'traceback' is to be ignored, skipping
2023-07-10 14:00:14,895 [DEBUG] module_dependency.py:66 -- Queuing module 'lithops.multiprocessing.process'
2023-07-10 14:00:14,895 [DEBUG] module_dependency.py:110 -- Module 'lithops' is to be ignored, skipping
2023-07-10 14:00:14,895 [DEBUG] serialize.py:84 -- Referenced modules: /Users/josep/anaconda3/lib/python3.10/traceback.py, /Users/josep/dev-workspace/lithops/lithops/multiprocessing/util.py, /Users/josep/dev-workspace/lithops/lithops/multiprocessing/process.py, /Users/josep/anaconda3/lib/python3.10/os.py
2023-07-10 14:00:14,895 [DEBUG] serialize.py:101 -- Modules to transmit: None

This step is executed in you local laptop/pc, before executing any function. So wondering what might be wrong with that

oanamoc commented 1 year ago

This is where it stops now (10 mins ago), after the update:

oanamoc@pc-10-200 lithops % cd /Users/oanamoc/Desktop/Work/lithops ; /usr/bin/env /usr/local/bin/python3 /Users/oanamoc/.vscode/extensions/ms-p ython.python-2023.10.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher 63851 -- /Users/oanamoc/Desktop/Work/lithops/task1.py 2023-07-10 13:22:06,726 [INFO] config.py:141 -- Lithops v2.9.1.dev0 2023-07-10 13:22:06,726 [DEBUG] config.py:103 -- Loading configuration from /Users/oanamoc/Desktop/Work/lithops/.lithops_config 2023-07-10 13:22:06,778 [DEBUG] config.py:193 -- Loading Serverless backend module: aws_lambda 2023-07-10 13:22:06,926 [DEBUG] config.py:235 -- Loading Storage backend module: aws_s3 2023-07-10 13:22:06,931 [DEBUG] aws_s3.py:37 -- Creating S3 client 2023-07-10 13:22:07,166 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-3 2023-07-10 13:22:07,399 [DEBUG] aws_lambda.py:52 -- Creating AWS Lambda client 2023-07-10 13:22:07,399 [DEBUG] aws_lambda.py:65 -- Creating Boto3 AWS Session and Lambda Client 2023-07-10 13:22:08,230 [INFO] aws_lambda.py:94 -- AWS Lambda client created - Region: eu-west-3 2023-07-10 13:22:08,232 [DEBUG] invokers.py:94 -- ExecutorID 22b745-0 - Invoker initialized. Max workers: 1000 2023-07-10 13:22:08,232 [DEBUG] invokers.py:293 -- ExecutorID 22b745-0 - Serverless invoker created 2023-07-10 13:22:08,232 [DEBUG] executors.py:167 -- Function executor for aws_lambda created with ID: 22b745-0 2023-07-10 13:22:08,930 [INFO] invokers.py:108 -- ExecutorID 22b745-0 | JobID M000 - Selected Runtime: lithops-default-runtime-v39 - 256MB 2023-07-10 13:22:08,930 [DEBUG] storage.py:438 -- Runtime metadata not found in local cache. Retrieving it from storage 2023-07-10 13:22:08,931 [DEBUG] storage.py:441 -- Trying to download runtime metadata from: aws_s3://lithops-data-yey/lithops.runtimes/aws_lambda/2.9.1.dev0/eu-west-3/lithops_v2-9-1-dev0_uxenlithops-default-runtime-v39_256MB.meta.json 2023-07-10 13:22:09,001 [DEBUG] storage.py:460 -- Runtime metadata not found in storage 2023-07-10 13:22:09,002 [INFO] invokers.py:116 -- Runtime lithops-default-runtime-v39 with 256MB is not yet deployed 2023-07-10 13:22:09,002 [INFO] aws_lambda.py:395 -- Deploying runtime: lithops-default-runtime-v39 - Memory: 256 - Timeout: 180 2023-07-10 13:22:09,002 [DEBUG] aws_lambda.py:304 -- Listing lambda layers 2023-07-10 13:22:09,196 [DEBUG] aws_lambda.py:308 -- Listed 0 layers 2023-07-10 13:22:09,197 [INFO] aws_lambda.py:194 -- Creating default lambda layer for runtime lithops-default-runtime-v39 2023-07-10 13:22:09,220 [DEBUG] aws_lambda.py:206 -- Creating "layer builder" function 2023-07-10 13:22:34,502 [DEBUG] aws_lambda.py:162 -- "lithops_v2-9-1-dev0_uxen_layer_builder_512MB" function is being deployed... (status: Pending) 2023-07-10 13:22:34,559 [DEBUG] aws_lambda.py:172 -- Ok --> function "lithops_v2-9-1-dev0_uxen_layer_builder_512MB" is active 2023-07-10 13:22:34,559 [DEBUG] aws_lambda.py:227 -- OK --> Created "layer builder" function lithops-default-runtime-v39 2023-07-10 13:22:34,560 [DEBUG] aws_lambda.py:237 -- Invoking "layer builder" function 2023-07-10 13:23:07,506 [DEBUG] aws_lambda.py:240 -- OK --> Layer lithops_v2-9-1-dev0_uxen_lithops-default-runtime-v39_layer built 2023-07-10 13:23:07,507 [DEBUG] aws_lambda.py:246 -- Trying to delete "layer builder" function 2023-07-10 13:23:07,779 [DEBUG] aws_lambda.py:254 -- Creating layer lithops_v2-9-1-dev0_uxen_lithops-default-runtime-v39_layer ... 2023-07-10 13:23:15,401 [DEBUG] aws_lambda.py:271 -- OK --> Layer lithops_v2-9-1-dev0_uxen_lithops-default-runtime-v39_layer created 2023-07-10 13:23:15,402 [DEBUG] utils.py:230 -- Creating function handler zip in lithops_lambda.zip 2023-07-10 13:23:43,354 [DEBUG] aws_lambda.py:162 -- "lithops_v2-9-1-dev0_uxenlithops-default-runtime-v39_256MB" function is being deployed... (status: Pending) 2023-07-10 13:23:43,428 [DEBUG] aws_lambda.py:172 -- Ok --> function "lithops_v2-9-1-dev0_uxenlithops-default-runtime-v39_256MB" is active 2023-07-10 13:23:43,428 [DEBUG] aws_lambda.py:449 -- OK --> Created lambda function lithops_v2-9-1-dev0_uxenlithops-default-runtime-v39_256MB 2023-07-10 13:23:43,428 [DEBUG] aws_lambda.py:716 -- Extracting runtime metadata from: lithops-default-runtime-v39 2023-07-10 13:23:44,882 [DEBUG] storage.py:471 -- Uploading runtime metadata to: aws_s3://lithops-data-yey/lithops.runtimes/aws_lambda/2.9.1.dev0/eu-west-3/lithops_v2-9-1-dev0_uxen__lithops-default-runtime-v39_256MB.meta.json 2023-07-10 13:23:45,184 [DEBUG] aws_s3.py:104 -- PUT Object lithops.runtimes/aws_lambda/2.9.1.dev0/eu-west-3/lithops_v2-9-1-dev0_uxenlithops-default-runtime-v39_256MB.meta.json - Size: 6.2KiB - OK 2023-07-10 13:23:45,185 [DEBUG] storage.py:477 -- Storing runtime metadata into local cache: /Users/oanamoc/.lithops/cache/lithops.runtimes/aws_lambda/2.9.1.dev0/eu-west-3/lithops_v2-9-1-dev0_uxenlithops-default-runtime-v39_256MB.meta.json 2023-07-10 13:24:27,165 [DEBUG] job.py:234 -- ExecutorID 22b745-0 | JobID M000 - Serializing function and data

oanamoc commented 1 year ago

It finished: 2023-07-10 13:57:11,713 [DEBUG] module_dependency.py:66 -- Queuing module 'os' 2023-07-10 13:57:11,713 [DEBUG] module_dependency.py:110 -- Module 'os' is to be ignored, skipping 2023-07-10 13:57:11,714 [DEBUG] module_dependency.py:66 -- Queuing module 'lithops.multiprocessing.util' 2023-07-10 13:57:11,714 [DEBUG] module_dependency.py:110 -- Module 'lithops' is to be ignored, skipping 2023-07-10 13:57:11,714 [DEBUG] module_dependency.py:66 -- Queuing module 'traceback' 2023-07-10 13:57:11,714 [DEBUG] module_dependency.py:110 -- Module 'traceback' is to be ignored, skipping 2023-07-10 13:57:11,714 [DEBUG] module_dependency.py:66 -- Queuing module 'lithops.multiprocessing.process' 2023-07-10 13:57:11,714 [DEBUG] module_dependency.py:110 -- Module 'lithops' is to be ignored, skipping 2023-07-10 13:57:11,714 [DEBUG] serialize.py:84 -- Referenced modules: /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/traceback.py, /Users/oanamoc/Library/Python/3.9/lib/python/site-packages/lithops/multiprocessing/util.py, /Users/oanamoc/Library/Python/3.9/lib/python/site-packages/lithops/multiprocessing/process.py, /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/os.py 2023-07-10 13:57:11,714 [DEBUG] serialize.py:101 -- Modules to transmit: None 2023-07-10 13:57:12,991 [DEBUG] pool.py:202 -- terminating pool

And it also gave me an exception.

Exception has occurred: Exception ExecutorID 22b745-0 | JobID M000 - Total data exceeded maximum size of 4.0MiB File "/Users/oanamoc/Desktop/Work/lithops/task1.py", line 32, in line_numbers = p.map(count_words, fhand) Exception: ExecutorID 22b745-0 | JobID M000 - Total data exceeded maximum size of 4.0MiB

JosepSampe commented 1 year ago

Maybe you can try a different python version, as your local python installation may contain something wrong that makes the serializer take too long.

oanamoc commented 1 year ago

The way I went about making the code was wrong