Open duetosymmetry opened 2 years ago
I think this would be entirely appropriate. Since sxs
already has so much code written around capabilities that would probably be common to all groups, I personally would find it exhausting to suggest a different interface. But it would require a little work.
First, we would need to get their catalog data into a compatible format — at least some subset of the metadata included in our catalog.json
— so that sxs
would know where to find their data. (There may also be some other assumptions built into sxs
about what sort of metadata is contained in the catalog entries, which could require rewriting bits of sxs
, but I wouldn't think there would be too much.)
Second, we would need load
(and optionally save
) functions written to handle their formats, so they would result in WaveformModes
objects. I think this for RIT and something in here for GT should just about work — though they'd probably need tweaks for their respective flavors of those formats.
Third, we would need logic in sxs.load
that could figure out which load
function to call for a given file. At least to start with, sxs.load
could probably just parse the strings to try to figure out the format.
So, not too much work, but not trivial.
Pinging @akhadse who has expressed interest in working on this.
Ideally the caching mechanism should work for others' waveforms, not just for SXS waveforms. I have not looked in detail at how the caching works, so I don't know how difficult that is!
Since others do not provide catalog.json
files, I suggested the following idea to Akshay. The BeautifulSoup code Mike wrote for parsing the RIT catalog would provide a good current snapshot of the web page. I don't think we should attempt to re-scrape the page every time somebody tries to load an RIT waveform. Instead, we include a static rit_catalog.json
file within this package itself, along with the code for generating it from scraping. Hopefully this encourages them to provide a json catalog in the future, and then we can remove the scraping code and static catalog file.
Re: logic to figure out which load function to call: Given the current sxs_loader
and sxs_handler
functions, do you imagine adding more cases to them, or adding additional rit_loader
and rit_handler
etc. functions, and having logic higher up to decide if it should call the RIT or SXS functions?
Thank you. I am working on the initial steps and will keep you updated.
Re. logic for load function: I think having the additional rit_loader
and rit_handler
functions might be easier but I will know better when I get there.
I'd like to point out that @prayush, @md-arif-shaikh, and @adivijaykumar have done the legwork and written prayush/nr-catalog-tools. @moble any opinions on if you'd still like this included in sxs
?
Yeah, the context is that I had some old code to integrate RIT / GT data into gw data analysis frameworks since the time we were working on this analysis of gw150914, but it had gotten quite outdated. So, when we recently wanted to look at those catalogs again, @md-arif-shaikh and @adivijaykumar wrote a fresh set of utilities to work with all catalogs. Then we thought it would be good idea to write/maintain a unified interface for all 3 catalogs, and be done once and for all!
It wasn't immediately obvious to me if the sxs
package or the 'sxs-collaboration' org would be the most intuitive place to put this code though, since it would be for non-sxs data specifically (and I hadn't read this issue). As I was familiar with the careful work already put into sxs
, I thought the next best thing would be to have a separate wrapper package that is designed to inherit maximally from the sxs
's classes for handling catalogs and waveform modes, but with the capability to uniformly handle other catalogs, as well as have some useful extras to allow for easy downstream interfacing with data analysis toolkits like pycbc
, lalsuite
and bilby
.
There is a basic POC here now. It, for eg., calls into sxs
for all mode/polarizations operations. For catalog handler classes, inheriting from sxs.Catalog
helps a lot with the GT catalog, the RIT catalog needs to be scraped from their web directory and cached. We can port nr-catalog-tools
under the sxs-collaboration org anytime (I was thinking of doing it myself), but why do we think the "sxs" package would be better for this code as opposed to it being a closely related thin wrapper with identical interface to sxs
(say, sxs-collaboration/catalogs), containing only the non-"sxs" pieces essentially. I guess my issue is mostly that a package with name "sxs" implies tools-to-work-with-sxs-data to users, but I'm open to change. @moble and others, what do you think?
That looks great, Prayush! I'm happy to see any improvement in the situation — whether it lives in sxs
or elsewhere.
IMO, this sort of capability would fit very nicely into sxs
, since we don't just produce data, but also tools to work with data. I don't see any reason that shouldn't involve other people's data.
Obviously, as you guys are the ones who've done all the work, I'll defer to your judgment about where it would best fit, but I'd certainly be happy to help it fit into this package. For example, it would be easy to add methods like get_mode
, get_td_waveform
, to_pycbc
, etc., directly to WaveformModes
. Even if you don't want to fully incorporate your efforts into sxs
, this would probably make things easier on everyone — and avoid standards proliferation. :)
(On a related note, I've long had a stub for a WaveformSignal
class that I meant to be what you call the td_waveform
. That's the sort of thing that could certainly be expanded.)
For example, if somebody calls
then based on the string before the colon, look in the RIT catalog instead of the SXS catalog. Is this an appropriate interface? Should this be within the
sxs
package, even though it is others' data? Or should it instead be in a different package? Ping @moble request for comment.