scicloj / wolframite

An interface between Clojure and Wolfram Language (the language of Mathematica)
https://scicloj.github.io/wolframite/
Mozilla Public License 2.0
56 stars 2 forks source link

Wolfram Engineering: How to pass large data between JVM and Wolfram efficiently? #113

Open holyjak opened 2 months ago

holyjak commented 2 months ago

How to pass large data between JVM and Wolfram efficiently? The Clojure - Python bridge uses direct memory sharing for passing of (large primitive) data efficiently. Turning 100k rows of data into JLink Expressions and then back is likely highly inefficient for that. Is there a better way, through memory or some efficient shared file format? (E.g. Parquet, though Wolfram doesn't seem to support that yet?)

The J/Link guide mentions in Speeding Up Sending Large Arrays:

You can send and receive arrays of most "primitive" Java types (e.g. byte, short, int, float, double) nearly as fast as in a C-language program. The set of types that can be passed quickly corresponds to the set of types for which the WSTP C API has single functions to put arrays. The Java types long (these are 64 bits), boolean, and String do not have fast WSTP functions, and so sending or receiving these types is much slower. Try to avoid using extremely large arrays of these types (say, more than 100,000 elements) if possible.

Setting $RelaxedTypeChecking may also be sometimes relevant for the return value (Wolfram → JVM), perhaps...

Note: It seems arrays are always sent by copy:

Unlike the C WSTP API, there are no methods for "releasing" strings or arrays because this is not necessary. When you read a string or array off the link, your program gets its own copy of the data, so you can write into it if you desire (although Java strings are immutable).

light-matters commented 6 days ago

Comment for posterity. It would be amazing if we could somehow use dtype-next here. Either for copying or for zero-copying in the ideal case.