sxs-collaboration / sxs

Python code for manipulating data from the SXS collaboration
MIT License
26 stars 18 forks source link

Enhancement: Consider adding support to load data from RIT/Maya/etc. data #64

Open duetosymmetry opened 2 years ago

duetosymmetry commented 2 years ago

For example, if somebody calls

sxs.load('RIT:BBH:0123/rPsi4')

then based on the string before the colon, look in the RIT catalog instead of the SXS catalog. Is this an appropriate interface? Should this be within the sxs package, even though it is others' data? Or should it instead be in a different package? Ping @moble request for comment.

moble commented 2 years ago

I think this would be entirely appropriate. Since sxs already has so much code written around capabilities that would probably be common to all groups, I personally would find it exhausting to suggest a different interface. But it would require a little work.

First, we would need to get their catalog data into a compatible format — at least some subset of the metadata included in our catalog.json — so that sxs would know where to find their data. (There may also be some other assumptions built into sxs about what sort of metadata is contained in the catalog entries, which could require rewriting bits of sxs, but I wouldn't think there would be too much.)

Second, we would need load (and optionally save) functions written to handle their formats, so they would result in WaveformModes objects. I think this for RIT and something in here for GT should just about work — though they'd probably need tweaks for their respective flavors of those formats.

Third, we would need logic in sxs.load that could figure out which load function to call for a given file. At least to start with, sxs.load could probably just parse the strings to try to figure out the format.

So, not too much work, but not trivial.

duetosymmetry commented 2 years ago

Pinging @akhadse who has expressed interest in working on this.

Ideally the caching mechanism should work for others' waveforms, not just for SXS waveforms. I have not looked in detail at how the caching works, so I don't know how difficult that is!

Since others do not provide catalog.json files, I suggested the following idea to Akshay. The BeautifulSoup code Mike wrote for parsing the RIT catalog would provide a good current snapshot of the web page. I don't think we should attempt to re-scrape the page every time somebody tries to load an RIT waveform. Instead, we include a static rit_catalog.json file within this package itself, along with the code for generating it from scraping. Hopefully this encourages them to provide a json catalog in the future, and then we can remove the scraping code and static catalog file.

Re: logic to figure out which load function to call: Given the current sxs_loader and sxs_handler functions, do you imagine adding more cases to them, or adding additional rit_loader and rit_handler etc. functions, and having logic higher up to decide if it should call the RIT or SXS functions?

akhadse commented 2 years ago

Thank you. I am working on the initial steps and will keep you updated. Re. logic for load function: I think having the additional rit_loader and rit_handler functions might be easier but I will know better when I get there.

duetosymmetry commented 1 year ago

I'd like to point out that @prayush, @md-arif-shaikh, and @adivijaykumar have done the legwork and written prayush/nr-catalog-tools. @moble any opinions on if you'd still like this included in sxs?

prayush commented 1 year ago

Yeah, the context is that I had some old code to integrate RIT / GT data into gw data analysis frameworks since the time we were working on this analysis of gw150914, but it had gotten quite outdated. So, when we recently wanted to look at those catalogs again, @md-arif-shaikh and @adivijaykumar wrote a fresh set of utilities to work with all catalogs. Then we thought it would be good idea to write/maintain a unified interface for all 3 catalogs, and be done once and for all!

It wasn't immediately obvious to me if the sxs package or the 'sxs-collaboration' org would be the most intuitive place to put this code though, since it would be for non-sxs data specifically (and I hadn't read this issue). As I was familiar with the careful work already put into sxs, I thought the next best thing would be to have a separate wrapper package that is designed to inherit maximally from the sxs's classes for handling catalogs and waveform modes, but with the capability to uniformly handle other catalogs, as well as have some useful extras to allow for easy downstream interfacing with data analysis toolkits like pycbc , lalsuite and bilby.

There is a basic POC here now. It, for eg., calls into sxs for all mode/polarizations operations. For catalog handler classes, inheriting from sxs.Catalog helps a lot with the GT catalog, the RIT catalog needs to be scraped from their web directory and cached. We can port nr-catalog-tools under the sxs-collaboration org anytime (I was thinking of doing it myself), but why do we think the "sxs" package would be better for this code as opposed to it being a closely related thin wrapper with identical interface to sxs (say, sxs-collaboration/catalogs), containing only the non-"sxs" pieces essentially. I guess my issue is mostly that a package with name "sxs" implies tools-to-work-with-sxs-data to users, but I'm open to change. @moble and others, what do you think?

moble commented 1 year ago

That looks great, Prayush! I'm happy to see any improvement in the situation — whether it lives in sxs or elsewhere.

IMO, this sort of capability would fit very nicely into sxs, since we don't just produce data, but also tools to work with data. I don't see any reason that shouldn't involve other people's data.

Obviously, as you guys are the ones who've done all the work, I'll defer to your judgment about where it would best fit, but I'd certainly be happy to help it fit into this package. For example, it would be easy to add methods like get_mode, get_td_waveform, to_pycbc, etc., directly to WaveformModes. Even if you don't want to fully incorporate your efforts into sxs, this would probably make things easier on everyone — and avoid standards proliferation. :)

(On a related note, I've long had a stub for a WaveformSignal class that I meant to be what you call the td_waveform. That's the sort of thing that could certainly be expanded.)