Open leoschwarz opened 1 month ago
Hi @leoschwarz, thanks for the report. I've transferred this over to the vl-convert repo which implements the image export logic.
The vl-convert bundles the Deno JavaScript runtime which only supports running on a single thread, but my understanding is that the loky backend uses separate processes, so I'm not certain that's the issue. Do you have the same issue using the multiprocessing API?
Thank you for transferring the issue. With multiprocessing
I cannot reproduce this issue, i.e. the following works:
from multiprocessing import freeze_support
import altair as alt
import joblib
import os
import pandas as pd
os.environ["RUST_BACKTRACE"] = "full"
def write_chart(filename):
df = pd.DataFrame({"x": [2, 3, 4], "y": [5, 5, 3]})
chart = alt.Chart(df).mark_point().encode(x="x", y="y")
chart.save(filename)
if __name__ == "__main__":
freeze_support()
filenames = [f"chart{i}.pdf" for i in range(2)]
joblib.Parallel(n_jobs=10, backend="multiprocessing")(
joblib.delayed(write_chart)(filename) for filename in filenames)
Taken from the loky README:
"All processes are started using fork + exec on POSIX systems. This ensures safer interactions with third party libraries. On the contrary, multiprocessing.Pool uses fork without exec by default, causing third party runtimes to crash (e.g. OpenMP, macOS Accelerate...)."
So my understanding is they use different fork models, but in this case the default multiprocessing seems to work whereas loky does not. I'm not really an expert on the details of multiprocessing to understand how this relates to Deno's runtime.
Thanks for the investigation @leoschwarz. Documentation is probably the best first step, just to let people know that the multiprocessing backend works but loky backend does not.
One thing that we might be able to do is expose an alternative API that doesn't rely on a global instance of the Rust object that wraps Deno. We might be able to expose a VlConverter()
class that wraps a dedicated instance of the Deno, that has methods for each of the global vl_convert.*
functions. The hope would be that if you create and use this from within the forked process, then everything will work fine since the global Deno instance wouldn't be forked.
I'm not sure if that would fully resolve the problem yet, because my workflow is basically joblib distributing tasks which execute the plotting within a new subprocess each starting its own Python interpreter (largely to avoid this type of problem), so I suspect the problem is located in a native extension doing something unusual with memory somewhere. I'm looking into creating a better example for this.
What happened?
Dear developers, I'm not sure if this is well known, but the following code
results in an error (full message below).
I'm reporting it here since I triggered with altair and would be nice to address with a fix or documentation, but maybe the issue originates in another project and it is beyond the scope of this issue tracker. If you think this would better fit into the loky or deno tracker I'm happy to move it there.
What would you like to happen instead?
The loop should work without an error, which is the case if you set either of:
n_jobs=1
backend="multiprocessing"
and addfreeze_support()
Especially the latter is interesting and is what I am using as a workaround now.
Which version of Altair are you using?
5.4.0