Open Akash-Maurya-0899 opened 5 months ago
I hit this issue earlier today, and I find it quite concerning. This is also causing the tests to be slow, so we should try to see if there is a solution.
Wondering if this is related to the following warning from the sxs
package:
You have called a function that uses the `Catalog` class,
which, as of `sxs` version 2024.0.0, has been deprecated in
favor of the `Simulations` interface. See the documentation
for more information.
OK, yes, indeed that is the issue. sxs.Catalog
is deprecated, and we might have to refactor our entire code to take care of this change :(
CC: @prayush
I do the following hack to avoid "infinitely" long waiting times: Instead of directly loading the catalog through nrcatalogtools, one can supply the path to the catalog while defining the object of the class. I notice that it takes significantly lesser time this way. Furthermore, instead of reloading it everytime, one can just simply save the nrcatalogtools.sxs.SXSCatalog object as a pickle file to avoid long waiting times. Please find the code below:
import sxs
from nrcatalogtools.sxs import SXSCatalog
from glob import glob
from subprocess import call
import json
import pickle
# Define the path to the SXS cache directory using the sxs library
sxs_cache_dir = str(sxs.sxs_directory("cache"))
# Check if the SXS catalog file is available in the cache directory
# If it exists, load the catalog.json file from the cache
try:
sxs_catalog = sxs.load(location="%s"%( glob(sxs_cache_dir + "/catalog.json")[0] ))
# If the file is not found (i.e., the catalog is missing), download the catalog.json file
# from the SXS website and save it in the cache directory
except:
call("wget https://data.black-holes.org/catalog.json -P %s"%(sxs_cache_dir), shell=True)
sxs_catalog = sxs.load(location="%s"%( glob(sxs_cache_dir + "/catalog.json")[0] ))
# Define the path to the nrcatalogtools SXSCatalog pickle file in the cache directory
nrcatalogtools_sxscatalog_path = glob(sxs_cache_dir + "/nrcatalogtools_sxscatalog.pkl")
# If the nrcatalogtools.sxs.SXSCatalog object is not saved in the cache, create it from the catalog.json
if len(nrcatalogtools_sxscatalog_path) == 0:
print(
"Loading SXS catalog through `nrcatalogtools.sxs.SXSCatalog`. This will take some time."
)
# Load the catalog.json data.
with open(sxs_cache_dir + "/catalog.json", "r") as f:
sxs_catalog_json = json.load(f)
# Create the SXSCatalog object using the loaded JSON data
nrcatalogtools_sxscatalog = SXSCatalog(catalog=sxs_catalog_json)
# Save the SXSCatalog object to a pickle file in the cache directory for future use
with open(nrcatalogtools_sxscatalog_path[0], "wb") as f:
pickle.dump(nrcatalogtools_sxscatalog, f)
# If the SXSCatalog object is already saved in the cache (i.e., the pickle file exists),
# load the object from the cache to avoid recomputing it
else:
print(f"Loading the `nrcatalogtools.sxs.SXSCatalog` object from cache directory: {nrcatalogtools_sxscatalog_path[0]}")
with open(nrcatalogtools_sxscatalog_path[0], "rb") as f:
nrcatalogtools_sxscatalog = pickle.load(f)
The following code snippet takes a long time to execute every time I run it:
(Just to be clear, it's the second line that's taking long time to execute)
I already have
catalog.zip
stored in my~/.cache/sxs
directory, and still it takes a lot of time to load. I also tried to explicitly disable the downloading like so:and it still takes long to execute.
Can this be cured or is this some "fundamental" I/O speed limitation in reading the
catalog.zip
file itself?