Closed omerb01 closed 4 years ago
The extra 0.5GiB used between memory1
and memory2
is probably caused by the temporary buffer used for pickling inside self._internal_storage.put_object
. It's unfortunate that this temporary buffer is needed, but IIRC ibm_boto3 doesn't provide an API for streaming writes.
Python often holds on to memory, instead of releasing it back to the OS, so that it can quickly allocate memory again without having to call the OS to give it more memory. I think that's the most likely explanation for why those logs show memory increasing to 1.5GiB and then staying there.
@LachlanStuart these spare 0.5GiB could be invested in formula images and reduce COS requests number. do you have a suggestion to use it better? how to force python to release this memory?
@omerb01 Python releasing the memory won't help. After the formula images are cleared and the temporary buffer for pickling is gone, Python just holds the empty memory so that it doesn't need to waste time returning it to the OS and then re-requesting it from the OS later.
The issue is that during saving (inside internal_storage.put_object
) , it currently needs memory for both the in-memory images, and the temporary buffer for pickling. This probably isn't a firm requirement though - pickle can output in a streaming way using pickle.dump
instead of pickle.dumps
, which requires very little extra memory. The issue is that AFAICT ibm_boto3 doesn't have an easy API to receive the data in a streaming way and upload it to COS.
The ibm_s3transfer
package, used internally by ibm_boto3, seems to have a UploadNonSeekableInputManager
class for this purpose, which can be accessed by calling cos_client.upload_fileobj
with a readable, non-seekable file-like object as an input. I have no idea how to actually use it here though. pickle.dump
wants some sort of writable file-like object and ibm_s3transfer.manager.TransferManager.upload
wants some sort of readable file-like object. I can't see anything in the Python standard library for bridging a writer and a reader...
@LachlanStuart I see... at the begging of the project, we needed to use COS with file-like-objects too, and I found an open-source GitHub repository which may help, it's called smart_open
https://github.com/RaRe-Technologies/smart_open
This package makes some sort of a wrapper for s3 clients to use it by file-like-objects with special open()
method as much like built-in open()
for local files. are you familiar with this package? I'm not sure if it will help in this case.
@omerb01 I haven't seen that before, but it looks like it would work. It seems to work by making a writable object that starts a multi-part upload and pushes a new part every time enough data has been written to it.
@LachlanStuart seems that it didn't solve the issue, I used smart_open
inside save_images()
of the manager's class:
from smart_open import open
ibm_cos_session = ibm_boto3.session.Session(aws_access_key_id='****',
aws_secret_access_key='****')
transport_params = {
'session': ibm_cos_session,
'resource_kwargs': {'endpoint_url': 'https://s3.****.cloud-object-storage.appdomain.cloud'}
}
with open(f's3://{bucket}/{key}', 'wb', transport_params=transport_params) as data_stream:
pickle.dump(self.formula_images, data_stream)
example of an activation log: (memory1 was measured before saving, memory2 was measured after clearing self.formula_images
)
"2020-01-15T14:22:19.856601Z stdout: ---------------------- FUNCTION LOG ----------------------",
"2020-01-15T14:22:19.856606Z stdout: Reading centroids segment metabolomics/tmp/centroids_segments/2341.msgpack",
"2020-01-15T14:22:19.856954Z stderr: /action/pywren_ibm_cloud/runtime/function_handler/jobrunner.py:253: FutureWarning: The read_msgpack is deprecated and will be removed in a future version.",
"2020-01-15T14:22:19.856960Z stderr: It is recommended to use pyarrow for on-the-wire transmission of pandas objects.",
"2020-01-15T14:22:19.856964Z stderr: result = function(**data)",
"2020-01-15T14:22:19.870137Z stdout: Reading dataset segments 54-55",
"2020-01-15T14:22:24.386184Z stdout: max_formula_images_mb: 412",
"2020-01-15T14:22:58.793623Z stdout: memory1: 764.7MiB",
"2020-01-15T14:22:58.793657Z stdout: Saving 196 images",
"2020-01-15T14:23:10.943005Z stdout: memory2: 1.2GiB",
"2020-01-15T14:23:32.471742Z stdout: memory1: 1.2GiB",
"2020-01-15T14:23:32.471892Z stdout: Saving 173 images",
"2020-01-15T14:23:45.249771Z stdout: memory2: 1.2GiB",
"2020-01-15T14:24:04.823081Z stdout: memory1: 1.2GiB",
"2020-01-15T14:24:04.823113Z stdout: Saving 171 images",
"2020-01-15T14:24:18.239402Z stdout: memory2: 1.2GiB",
"2020-01-15T14:24:23.985985Z stdout: memory1: 1.2GiB",
"2020-01-15T14:24:23.986018Z stdout: Saving 44 images",
"2020-01-15T14:24:26.765940Z stdout: Centroids segment metabolomics/tmp/centroids_segments/2341.msgpack finished",
"2020-01-15T14:24:26.777186Z stdout: ----------------------------------------------------------",
@omerb01 I dug a bit deeper and found out a few frustrating things about Python's pickler:
pickle.dump
never flushes its temporary output buffer - instead it just makes a single call to write
with the entire file. This means it always allocates a large temporary buffer (approximately the same size as the input data) to hold the output.
pickle._dump
) doesn't have this problem - it does MANY small writes instead, which is a bit slower but is memory efficient.coo_matrix
) create a temporary copy of their data when they're pickled, effectively doubling their memory usage while that array exists. Pickle implementations "memoize" this temporary copy, meaning that it can't be GC'd until the pickle process is finished.
Here's my test code in case you want to try it:
import pickle, resource, numpy as np
from scipy.sparse import coo_matrix
class BlackHole:
def __init__(self):
self.cnt = 0
self.biggest = 0
self.total = 0
def __del__(self):
print(f'writes: {self.cnt} biggest: {self.biggest} total: {self.total}')
def write(self, bytes):
self.cnt += 1
self.biggest = max(self.biggest, len(bytes))
self.total += len(bytes)
# big_dict = dict((i, coo_matrix(np.arange(10000))) for i in range(10000)) # coo_matrixes
big_dict = dict((i, np.arange(10000)) for i in range(10000)) # numpy arrays
# big_dict = dict((i, list(range(10000))) for i in range(10000)) # pure Python objects
print(f'Max memory usage before: {resource.getrusage(resource.RUSAGE_SELF).ru_maxrss} kiB')
# Uncomment one of the below implementations
# normal pickle (C implementation)
# pickle.dump(big_dict, BlackHole())
# Python implementation
# pickle._dump(big_dict, BlackHole())
# C implementation ("fast mode")
# p = pickle.Pickler(BlackHole())
# p.fast = True
# p.dump(big_dict)
# del p # needed to trigger BlackHole.__del__
# Python implementation ("fast mode")
# p = pickle._Pickler(BlackHole())
# p.fast = True
# p.dump(big_dict)
# del p # needed to trigger BlackHole.__del__
print(f'Max memory usage after: {resource.getrusage(resource.RUSAGE_SELF).ru_maxrss} kiB')
Note that the "Max memory usage" metric can't be reset, so Python should be restarted every time after this test.
Currently the code needs memory for 3x the size of formula_images
: 1x for the original formula_images
instance, 1x for the output buffer due to the "never flushing the output buffer" bug, and 1x due to the "numpy copies" bug
To fix the "never flushing the output buffer" problem, there are these options:
coo_matrix
sTo fix the "numpy copies" problem, the options are:
For these steps forward, I think we also need to decide which solutions should be implemented in PyWren's put_object
, and which should be implemented specific to this code. I'm sure other PyWren users would benefit from being able to pickle with less memory usage, but there's a question of how hacky we're willing to go...
The "numpy copies" problem has actually already been reported to Python, but it unfortunately seems to be stuck in PR: https://github.com/python/cpython/pull/13036
@LachlanStuart based on your script above, I created a script that prints the actual memory peak when doing an operation, for example, pickling objects: https://gist.github.com/JosepSampe/25d2f1bdf8250ec56f4e739d8c2b4e6e
Based on the results, seems that using fast mode in Python3.7 & Python3.8 (either C and Python) does not have extra memory consumption:
Python3.6
Allocating source data...
=> peak memory usage: 1.660 GB
Dumping (C)...
writes: 1 biggest: 1602049329 total: 1602049329
done in 1.669s
=> peak memory usage: 4.836 GB
Dumping (Python)...
writes: 880076 biggest: 79997 total: 1602049327
done in 3.208s
=> peak memory usage: 3.300 GB
Dumping (C Fast mode)...
writes: 1 biggest: 1604829770 total: 1604829770
done in 1.287s
=> peak memory usage: 3.274 GB
Dumping (Python fast mode)...
writes: 1150023 biggest: 79997 total: 1604829768
done in 3.830s
=> peak memory usage: 1.670 GB
Python3.7
Allocating source data...
=> peak memory usage: 1.661 GB
Dumping (C)...
writes: 20001 biggest: 80381 total: 1602049329
done in 1.114s
=> peak memory usage: 3.281 GB
Dumping (Python)...
writes: 890076 biggest: 79992 total: 1602049327
done in 3.065s
=> peak memory usage: 3.300 GB
Dumping (C Fast mode)...
writes: 20001 biggest: 80493 total: 1604829770
done in 0.680s
=> peak memory usage: 1.662 GB
Dumping (Python fast mode)...
writes: 1160023 biggest: 79992 total: 1604829768
done in 3.656s
=> peak memory usage: 1.662 GB
Python3.8
Allocating source data...
=> peak memory usage: 1.665 GB
Dumping (C)...
writes: 30001 biggest: 80264 total: 1601390005
done in 1.120s
=> peak memory usage: 3.283 GB
Dumping (Python)...
writes: 60003 biggest: 80253 total: 1601390003
done in 2.897s
=> peak memory usage: 3.296 GB
Dumping (C Fast mode)...
writes: 30001 biggest: 80356 total: 1604849779
done in 0.716s
=> peak memory usage: 1.687 GB
Dumping (Python fast mode)...
writes: 60003 biggest: 80347 total: 1604849777
done in 3.908s
=> peak memory usage: 1.687 GB
@LachlanStuart This is not the same you stated in the previous comment, so can you confirm this? In my case pickle.dump()
performs only one write in Python3.6 and much more writes in Python3.7 & Python3.8
@JosepSampe I don't have the Python3.7 environment that I used for that experiment anymore. It's possible I wasn't on the latest sub-version (3.7.6) because I just grabbed an existing environment that was already set up.
I've re-tested on 3.7.6 and I got the same results as you.
@omerb01 what is the status of it?
@gilv solving this issue requires to move into Python 3.7
https://github.com/metaspace2020/pywren-annotation-pipeline/blob/095b4dcce9141b0f530e94fd163fe3bf1447ea52/annotation_pipeline/image.py#L39
I haven't succeeded to figure out why yet, but it seems that something wrong with the memory manager when it clears
formula_images
dict. I know that python's garbage collector works from time to time and frees unreachable pointers, so I even tried to execute it explicitly by:and it still shows the same output.
Example activation log of
annotate()
("memory1" is action's memory before data is cleared and "memory2" is action's memory after data is cleared - used bypywren_ibm_cloud.utils.get_current_memory_usage()
):in the example, all "memory2" records should be around 1GB instead of 1.5GB