Closed jacobzweig closed 5 years ago
Just a cursory glance, but it looks like you are sharing a singleconcatenated_embeddings
variable by passing it as args
to each Process. This might be root cause of the issues. I'm not sure how Python is sharing a variable that isn't serializable across processes, but I would imagine it is a straight memory copy byte for byte of the object. I would not share a Magnitude variable like that between processes as it contains things that should not be memory copied like that (like database references).
Instead, it's better to instantiate to Magnitude within each process (i.e. call Magnitude()
constructor in each process and in your case also concatenate in each process). Magnitude utilizes memory-maps, so even though you are instantiating a Magnitude object in multiple processes, it will try to not duplicate memory where it can and your application should not take a hit on performance.
Let me know if that solves the issue, if not I'll keep digging.
Sorry for delayed reply – that worked!
@AjayP13 if you have a moment, could I please ask you to say a little bit about why copying an object across multiprocessing processes might cause a sqlite db to become corrupted? Any insights you can offer on this question would be super helpful!
Hi Doug,
Yes, you should not copy it across multiple processes. This is because it will copy the SQLite connection/cursor along with it, which is not safe to do.
However, Magnitude does support multiple processes. Just instantiate a new Magnitude object in each process. They will try to share memory though memory mapping as to not duplicate resources.
On Sat, Apr 3, 2021 at 10:49 AM Douglas Duhaime @.***> wrote:
@AjayP13 https://github.com/AjayP13 if you have a moment, could I please ask you to say a little bit about why copying an object across multiprocessing processes might cause a sqlite db to become corrupted? Any insights you can offer on this question would be super helpful!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/plasticityai/magnitude/issues/39#issuecomment-812875434, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJHMEI7Z5QRGQISKAXMLHLTG4TGNANCNFSM4GGQF4UQ .
-- Ajay Patel | 408.348.2531
Amen, thanks very much @AjayP13!
Hi @AjayP13, I was curious if you had any examples of how you've used this with multiprocessing previously. I'm bumping into a pysqlite error when I try to run with multiprocessing:
I've tried reloading the
.Magnitude
files as well as settingblocking=True
, but can't seem to get around it. Any ideas?Thanks!