gwnrtools / nr-catalog-tools

A unified interface to various catalogs of Numerical Relativity simulations of compact binary mergers.
https://github.com/gwnrtools/nr-catalog-tools
GNU General Public License v3.0
1 stars 5 forks source link

Long SXS catalog loading times #39

Open Akash-Maurya-0899 opened 4 months ago

Akash-Maurya-0899 commented 4 months ago

The following code snippet takes a long time to execute every time I run it:

from nrcatalogtools import SXSCatalog
sxscatalog = SXSCatalog.load()

(Just to be clear, it's the second line that's taking long time to execute)

I already have catalog.zip stored in my ~/.cache/sxs directory, and still it takes a lot of time to load. I also tried to explicitly disable the downloading like so:

sxscatalog = SXSCatalog.load(download=False)

and it still takes long to execute.

Can this be cured or is this some "fundamental" I/O speed limitation in reading the catalog.zip file itself?

adivijaykumar commented 2 months ago

I hit this issue earlier today, and I find it quite concerning. This is also causing the tests to be slow, so we should try to see if there is a solution.

adivijaykumar commented 2 months ago

Wondering if this is related to the following warning from the sxs package:

        You have called a function that uses the `Catalog` class,
        which, as of `sxs` version 2024.0.0, has been deprecated in
        favor of the `Simulations` interface.  See the documentation
        for more information.
adivijaykumar commented 2 months ago

OK, yes, indeed that is the issue. sxs.Catalog is deprecated, and we might have to refactor our entire code to take care of this change :(

CC: @prayush

anuj137 commented 3 weeks ago

I do the following hack to avoid "infinitely" long waiting times: Instead of directly loading the catalog through nrcatalogtools, one can supply the path to the catalog while defining the object of the class. I notice that it takes significantly lesser time this way. Furthermore, instead of reloading it everytime, one can just simply save the nrcatalogtools.sxs.SXSCatalog object as a pickle file to avoid long waiting times. Please find the code below:

import sxs
from nrcatalogtools.sxs import SXSCatalog
from glob import glob
from subprocess import call
import json
import pickle

# Define the path to the SXS cache directory using the sxs library
sxs_cache_dir = str(sxs.sxs_directory("cache"))

# Check if the SXS catalog file is available in the cache directory
# If it exists, load the catalog.json file from the cache
try:
    sxs_catalog = sxs.load(location="%s"%( glob(sxs_cache_dir + "/catalog.json")[0] ))

# If the file is not found (i.e., the catalog is missing), download the catalog.json file
# from the SXS website and save it in the cache directory
except:
    call("wget https://data.black-holes.org/catalog.json -P %s"%(sxs_cache_dir), shell=True)
    sxs_catalog = sxs.load(location="%s"%( glob(sxs_cache_dir + "/catalog.json")[0] ))

# Define the path to the nrcatalogtools SXSCatalog pickle file in the cache directory
nrcatalogtools_sxscatalog_path = glob(sxs_cache_dir + "/nrcatalogtools_sxscatalog.pkl")

# If the nrcatalogtools.sxs.SXSCatalog object is not saved in the cache, create it from the catalog.json
if len(nrcatalogtools_sxscatalog_path) == 0:
    print(
        "Loading SXS catalog through `nrcatalogtools.sxs.SXSCatalog`. This will take some time."
    )

    # Load the catalog.json data.
    with open(sxs_cache_dir + "/catalog.json", "r") as f:
        sxs_catalog_json = json.load(f)

    # Create the SXSCatalog object using the loaded JSON data
    nrcatalogtools_sxscatalog = SXSCatalog(catalog=sxs_catalog_json)

    # Save the SXSCatalog object to a pickle file in the cache directory for future use
    with open(nrcatalogtools_sxscatalog_path[0], "wb") as f:
        pickle.dump(nrcatalogtools_sxscatalog, f)

# If the SXSCatalog object is already saved in the cache (i.e., the pickle file exists),
# load the object from the cache to avoid recomputing it
else:
    print(f"Loading the `nrcatalogtools.sxs.SXSCatalog` object from cache directory: {nrcatalogtools_sxscatalog_path[0]}")
    with open(nrcatalogtools_sxscatalog_path[0], "rb") as f:
        nrcatalogtools_sxscatalog = pickle.load(f)