ronny-rentner / UltraDict

Sychronized, streaming Python dictionary that uses shared memory as a backend
Apache License 2.0
267 stars 25 forks source link

Shared memory not always cleared #16

Closed joelsdc closed 2 years ago

joelsdc commented 2 years ago

Hi,

I'm using UltraDict to share data between a master process and several subprocesses.

I have auto_unlink=True on all declarations, but sometimes if the script fails (meaning something wrong in the code, or an unexpected error) it won't clear the memory, thus on the next run, when the master process creates the "new" UltraDict object, it reuses the same information from the previous execution (as the UltraDict names are predefined).

Is there a way to clear the memory of previous executions without having to reboot the server?

Thanks.

ronny-rentner commented 2 years ago

Hey, yes, you have several options.

First of all, please note that in the dev branch, the cleanup code is improved, especially when using recurse=True, but also in some other cases.

Please also note that you should not have to set auto_unlink=True. It will be automatically set for the first process that creates an UltraDict and it should only be turned on for one process because once this process ends and it unlinks the shared memory, no further connections can be made (but the existing connections continue to work on Linux).

On Unix/Linux, the shared memory lives in /dev/shm. You can just manually the delete the respective files that correspond to the name of your UltraDict, e. g. when you use UltraDict(name='foo') you'll have the files /dev/shm/foo and /dev/shm/foo_memory.

Another option to handle this automatically is to just manually unlink whatever is there: UltraDict(name='foo').unlink(force=True) new_ultra_dict = UltraDict(name='foo')

You will need to set force=True because otherwise UltraDict will not only really unlink the shared memory unless unlink() is called by the process that has created the shared memory. Then it will really unlink without having to force it. Make sure not to use the UltraDict instance after calling unlink(), create a new one.

Another option is to use UltraDict.unlink_by_name() even though this is not a static method in the main branch, but it is in the dev branch. You can do: UltraDict.unlink_by_name('foo') UltraDict.unlink_by_name('foo_memory')

This will throw an exception if the memory does not exist which you can catch and ignore. Please note to unlink both shared memory blocks used by UltraDict. If you use name="foo", there will by "foo" for storing control information and "foo_memory" for the streaming buffer.

joelsdc commented 2 years ago

Hey @ronny-rentner, thanks for all this great info. I've tested several of the proposed solutions, and they all work for my use case.

Let me know if you need help in the future testing stuff as I'll be happy to do so.