igraph / rigraph

igraph R package
https://r.igraph.org
555 stars 201 forks source link

Ensure igraph can be installed in webR #1284

Open krlmlr opened 8 months ago

krlmlr commented 8 months ago

@georgestagg: Is there anything we can do to help? I'd like to create a webR demo for igraph and dm.

georgestagg commented 8 months ago

Also reported at https://github.com/r-wasm/webr/issues/341.

I had another look at it this today. The main blocker is the Fortran library arpack. The Fortran compiler that we are using, a patched version of LLVM Flang, does not yet support generating WebAssembly output for Fortran COMMON blocks (a form of global variables). Supporting COMMON blocks will require changes to LLVM itself.

The COMMON blocks in question are used in arpack-ng for reporting both debugging information and numerical statistics, defined in debug.h and stat.h.

If you aren't actually using the debugging functionality in arpack-ng, I think you can comment the COMMON blocks out without loss of any other functionality. I have never used {igraph} before, so I don't know how much, if at all, the debug.h and stat.h features are actually used.

I just tried it for myself, commenting out the blocks:

https://github.com/igraph/rigraph/blob/4f4fce51d5dc81185bf7a4d1bb92c7d03e51213e/src/vendor/arpack/debug.h#L12-L16

and

https://github.com/igraph/rigraph/blob/4f4fce51d5dc81185bf7a4d1bb92c7d03e51213e/src/vendor/arpack/stat.h#L16-L21

by prepending a c character in the first column of those lines.

With this change, the R package is able to be compiled for WebAssembly.


Next, once compiled the package still does not load, with error:

> library(igraph)
Error: package or namespace load failed for ‘igraph’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/usr/lib/R/library/igraph/libs/igraph.so':
  Could not load dynamic lib: /usr/lib/R/library/igraph/libs/igraph.so
LinkError: WebAssembly.Instance(): Import #226 module="env" function="dgesv_": imported function does not match the expected type

This is a WebAssembly issue occurring when loading a Lapack symbol. Unlike most systems, function signatures must be declared consistently for WebAssembly symbols. R's built-in Lapack implementation declares Fortran subroutines as returning void, but it looks like the source in this package declares Lapack subroutines as returning int:

https://github.com/igraph/rigraph/blob/4f4fce51d5dc81185bf7a4d1bb92c7d03e51213e/src/vendor/cigraph/src/linalg/lapack_internal.h#L147-L153

In addition, extra so-called "hidden" Fortran character length arguments are missing from some of the function signatures in that file. Normally, these differences do not matter much, but under WebAssembly it does.

I continued to experiment, switching int to void and adding the extra length arguments to lapack_internal.h and where those functions are called, and recompiled the igraph package.

After these further changes, the package loads under webR. I am not very familiar with how to use igraph, but basic functionality seems to work:

Screenshot 2024-03-06 at 13 42 55

I have updated the igraph package on the webR binary Wasm package repository, so you can try this version out for yourself at https://webr.r-wasm.org/latest/ now using webr::install("igraph"). You might be able to trigger problems that I have not seen.

And, a summary of the changes I've made is here: https://github.com/igraph/rigraph/compare/main...r-wasm:rigraph:webr.

While these workarounds seem to get things up and running for Wasm, I don't really know how safe they are for the more traditional R systems. Applying them as-is means editing vendored source code, and might even break the package for normal Linux, Windows and macOS users.

Nevertheless, this should give you an idea of what the path looks like in the long term for webR compatibility. For formal changes to the C source code, it's possible to make specific changes for Wasm gated behind #ifdef __EMSCRIPTEN__ blocks, if changes are problematic on other systems. The technique is useful, I've used it before to make packages work on webR without affecting other systems and it works pretty well.

Another option might be to simply wait until our version of the LLVM Flang compiler better supports COMMON blocks, rather than hacking around the issue by commenting them out. Unfortunately, I don't have an idea of the timescale for that.

In the meantime, I am happy to maintain the fork at https://github.com/r-wasm/rigraph/tree/webr, we already do so for some other R packages that require patches for Wasm. That might be the simplest solution in the short term.

szhorvat commented 8 months ago

With the C core we bundle an older version of ARPACK that was translated to C with f2c. This would probably trigger compiler warnings on CRAN, so it's not option there, but could it be used with webR?

georgestagg commented 8 months ago

With the C core we bundle an older version of ARPACK that was translated to C with f2c [...] could it be used with webR?

Perhaps, can you please send me a link to one of the f2c converted ARPACK source files? I'll take a look.

However, note that the f2c converted version of Lapack in this repo also sets subroutines to have an integer return type, inconsistent with R's version of Lapack symbols:

https://github.com/igraph/rigraph/blob/4f4fce51d5dc81185bf7a4d1bb92c7d03e51213e/src/vendor/cigraph/vendor/lapack/dgeev.c#L209-L212

So, we'd still have to deal with the int -> void mapping when compiling under Emscripten.

szhorvat commented 8 months ago

I'm sorry, you are correct. The int return type is still there. Translated LAPACK and ARPACK source files are all here: https://github.com/igraph/igraph/tree/master/vendor/lapack An example of an ARPACK one is dgetv0.c

The lack of a standard interface between C and Fortran (with older Fortran) tends to be an issue ...

krlmlr commented 8 months ago

Should we use the f2c translation also for the R package? Any downsides?

szhorvat commented 8 months ago

Many downsides. Worse performance, compiler warnings are likely, manual fix for warnings is not realistic, outdated ARPACK version. Upside: No need for a Fortran compiler, no issues with calling conventions (as discussed here), anyone can compile the igraph C core with minimal technical experience (which is important given our userbase)

The upsides don't all apply for the R interface ...