Open Aarif1430 opened 2 years ago
I'm also seeing this error with Python 3.8
[ERROR] OSError: [Errno 38] Function not implemented
Traceback (most recent call last):
....
File "/var/task/opensearchpy/helpers/actions.py", line 469, in parallel_bulk
pool = BlockingPool(thread_count)
File "/var/lang/lib/python3.8/multiprocessing/pool.py", line 925, in __init__
Pool.__init__(self, processes, initializer, initargs)
File "/var/lang/lib/python3.8/multiprocessing/pool.py", line 196, in __init__
self._change_notifier = self._ctx.SimpleQueue()
File "/var/lang/lib/python3.8/multiprocessing/context.py", line 113, in SimpleQueue
return SimpleQueue(ctx=self.get_context())
File "/var/lang/lib/python3.8/multiprocessing/queues.py", line 336, in __init__
self._rlock = ctx.Lock()
File "/var/lang/lib/python3.8/multiprocessing/context.py", line 68, in Lock
return Lock(ctx=self.get_context())
File "/var/lang/lib/python3.8/multiprocessing/synchronize.py", line 162, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/var/lang/lib/python3.8/multiprocessing/synchronize.py", line 57, in __init__
sl = self._semlock = _multiprocessing.SemLock(
@jasongilman Did you get this error in a lambda or elsewhere?
@wbeckler It was in a lambda.
@jasongilman Yes it was in aws lambda.
Is anyone up for contributing a patch that addresses this issue when /dev/shm isn't available? There's a potential drop in replacement for the multiprocessing library: https://pypi.org/project/lambda-multiprocessing/
At a high level, is this issue about adding Python 3.9 support (starting with CI)?
@Aarif1430 @jasongilman Is the bug still persisting?
CI with Python 3.9 was added in https://github.com/opensearch-project/opensearch-py/pull/336 and it currently passes. We need a test that reproduces this problem.
I'm able the reproduce the issue:
Create lambda with python3.9:
import json
from multiprocessing.pool import ThreadPool
def lambda_handler(event, context):
print("Hello")
pool = ThreadPool()
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
Give error
{
"errorMessage": "[Errno 38] Function not implemented",
"errorType": "OSError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 6, in lambda_handler\n pool = ThreadPool()\n",
" File \"/var/lang/lib/python3.9/multiprocessing/pool.py\", line 927, in __init__\n Pool.__init__(self, processes, initializer, initargs)\n",
" File \"/var/lang/lib/python3.9/multiprocessing/pool.py\", line 196, in __init__\n self._change_notifier = self._ctx.SimpleQueue()\n",
" File \"/var/lang/lib/python3.9/multiprocessing/context.py\", line 113, in SimpleQueue\n return SimpleQueue(ctx=self.get_context())\n",
" File \"/var/lang/lib/python3.9/multiprocessing/queues.py\", line 341, in __init__\n self._rlock = ctx.Lock()\n",
" File \"/var/lang/lib/python3.9/multiprocessing/context.py\", line 68, in Lock\n return Lock(ctx=self.get_context())\n",
" File \"/var/lang/lib/python3.9/multiprocessing/synchronize.py\", line 162, in __init__\n SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)\n",
" File \"/var/lang/lib/python3.9/multiprocessing/synchronize.py\", line 57, in __init__\n sl = self._semlock = _multiprocessing.SemLock(\n"
]
}
Looking at https://pypi.org/project/lambda-thread-pool/
You cannot use "multiprocessing.Queue" or "multiprocessing.Pool" within a Python Lambda environment because the Python Lambda execution environment does not support shared memory for processes.
This means we need to get rid of or be able to swap ThreadPool
with LambdaThreadPool
in https://github.com/opensearch-project/opensearch-py/blob/da436cbbe8dda34abd607f527d4f0bdacb9b30d8/opensearchpy/helpers/actions.py#L470.
For an immediate workaround you can copy-paste the parallel_bulk
implementation and replace BlockingPool
with LambdaThreadPool
and see if that works. For something maintainable, I would extract BlockingPool
from this implementation by adding an abstract thread pool interface, implement another one for LambdaThreadPool
and add a configuration parameter to specify which thread pool to use. Anyone wants to give either a try?
I renamed this to "parallel_bulk doesn't work in AWS lambda", is there anything else that doesn't?
Thank you, in my case the ThreadPool is used by some sdk and it wouldn't be ideal to change. We started getting the issue when upgrading from python3.7 to 3.9. We might just find an alternative solution instead of using the sdk.
It looks like:
synchronize.Lock doesn't work in lambda for any version of Python (lambda has no /dev/shm, and no write access to /dev in lambda - see: https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda )
ThreadPool is now using synchronize.Lock from version 3.9
To Reproduce
Steps to reproduce the behavior:
opensearch-py==1.0.0
to aws lambdaExpected behavior The opensearch client should work as it was working fine with python3.6
Plugins
opensearch-py==1.0.0
Screenshots Error screenshots
Host/Environment (please complete the following information):
Additional context Add any other context about the problem here.