pkgw / rubbl

Rust + Hubble = astrophysics in Rust
Other
16 stars 5 forks source link

rubbl_casacore possibility of accessing Dysco compressed MeasurementSets? #412

Open tikk3r opened 1 month ago

tikk3r commented 1 month ago

First of all, great to see a crate enabling interaction with the monstrosity that is a MeasurementSet from Rust in the first place.

I might be overlooking something, but is there perhaps a way to interface with MeasurementSets that are compressed with dysco? LOFAR datasets use this by default. I have it built and on LD_LIBRARY_PATH or forced to load it with LD_PRELOAD, but alas it keeps running into

Error: Casacore(CasacoreError("Table DataManager error: Data Manager class DyscoStMan is not registered\n  Check (DY)LD_LIBRARY_PATH matches the libraries used during the build of DyscoStMan"))

Is there a way I could get the rubbl_casatables to pick up on dysco, or would this require more effort?

pkgw commented 1 month ago

A Rubbl code change would be needed to integrate the C++ casacore code that adds Dysco support. I didn't include this code in the recent update because it would require a dependency on the external Dysco library, which I wanted to avoid. But I think that such a dependency could be added pretty easily as a Cargo feature, if someone like you had interest in it.

Would you potentially be interested in putting in a PR to add this? I can walk you through the pieces that I think would be needed to get this working; I don't expect that it would require any substantial new code to be written.

tikk3r commented 1 month ago

I'd have to preface it with that I have no C++ programming experience (coming from just building our software stacks and programming in Python for my day to day stuff otherwise) and am only a novice at Rust still, but I'd be happy to take a stab at it if you think with some walking through it's manageable.

tikk3r commented 1 month ago

Alright, I fiddled around a bit by looking how the current casacore code is included and handled in build.rs, and copying over the relevant casacore source files related to dysco into the casacore that comes with rubbl_casatabels. Using the example in rubbl-rxpackage I seem to have something working to first order now with a Dysco compressed set:

(pyenv-py3) RUSTFLAGS="-lgsl -lcblas" cargo run
   Compiling rubbl-rxpackage v0.0.0-dev.0 (/data1/tikk3r/rust_ms/rubbl-rxpackage)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 4.59s
warning: the following packages contain code that will be rejected by a future version of Rust: nom v5.1.2
note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 1`
     Running `target/debug/rubbl-rxpackage`
[src/main.rs:38:9] main_data = [[Complex {
    re: 139.75493,
    im: 26.690851,
}, Complex {
    re: -44.5641,
    im: -44.5641,
}, Complex {
    re: -33.135002,
    im: -18.953083,
}, Complex {
    re: 112.04767,
    im: 21.399227,
}],
 [Complex {
    re: 151.26367,
    im: 44.43459,
}, Complex {
    re: -14.967233,
    im: -36.277115,
}, Complex {
    re: -47.003956,
    im: -33.466644,
}, Complex {
    re: 121.267784,
    im: 35.623123,
}],

I'm having a bit of trouble getting cargo to pick up on the BLAS and GSL libraries, which the RUSTFLAGS can fix ad-hoc, but that doesn't sound like a long-term solution to me. I can fork and open a PR with what I have so far however, and go from there.

pkgw commented 1 month ago

Great! That's probably half the battle right there.

The proper way to link with system libraries like BLAS or GSL is also through the Cargo build system. Various printouts in the build.rs script can tell Cargo how to link with such libraries; and for many system libraries there exist -sys crates on Crates.io that encapsulate the necessary steps for you. I see a blas-sys on Crates.io as well as a GSL-sys, which may suffice. If for some reason those are insufficient, Crates like pkg-config can be used in the build.rs script to discover external dependencies and emit the proper linkage information.

I am a little curious about where these dependencies are coming from, though. Does Dysco support require linking with the separate libdyscostman shared library as well? Or is it maybe loaded at runtime with dlopen()? Either way, transitive dependencies like GSL or BLAS should be automatically pulled in if the Dysco shared library has proper metadata. There should only be a need to link the Rubbl crates with GSL or BLAS if the new C++ code introduces direct dependencies on those libraries.

Once that's all straightened out, the final step would be to make all of this support conditional by use of a Cargo feature. That should be a fairly straightforward bit of plumbing.

tikk3r commented 4 weeks ago

Cool. I have opened a draft PR #413 to make it more tractable and easier to follow along (hopefully). I'll have another go at getting Cargo to find the dependencies on its own, and check out those -sys crates for guidance.

I am a little curious about where these dependencies are coming from, though. Does Dysco support require linking with the separate libdyscostman shared library as well? Or is it maybe loaded at runtime with dlopen()? Either way, transitive dependencies like GSL or BLAS should be automatically pulled in if the Dysco shared library has proper metadata. There should only be a need to link the Rubbl crates with GSL or BLAS if the new C++ code introduces direct dependencies on those libraries.

I don't think Dysco is explicitely linked against, as it used to be separate and in my LOFAR containers still I build casacore with -DBUILD_DYSCO=OFF and then separately build and install Dysco. I don't see an explicit dlopen() in casacore from a quick search, but I guess it must be dynamically loaded somehow then? Dysco does directly depend on GSL if I understand the terminology there correctly, in the CMake files it searches for GSL and the code uses it explicitely. I'm not sure why it also struggled with BLAS, as that does not seem to be a direct dependency.

pkgw commented 4 weeks ago

Awesome, thank you. Let's plan to shift the discussion to the PR thread.