intake / intake-astro

Astronomical data sources for Intake
BSD 2-Clause "Simplified" License
6 stars 5 forks source link

header/wcs from remote catalog? #2

Open timothydmorton opened 5 years ago

timothydmorton commented 5 years ago

If I have a FITS file on disk locally, I see that I can access the header/WCS info if I read it in first; e.g.,

source = intake.open_fits_array('/Users/tdm/Downloads/ACTPol_148_D6_PA1_S1_1way_I.fits')
arr = source.read()
source.wcs

and that gives me

WCS Keywords

Number of WCS axes: 2
CTYPE : 'RA---CEA'  'DEC--CEA'  
CRVAL : 0.0  0.0  
CRPIX : 5832.0  1302.0  
PC1_1 PC1_2  : 1.0  0.0  
PC2_1 PC2_2  : 0.0  1.0  
CDELT : -0.008333333333333333  0.008333333333333333  
NAXIS : 3521  1505

However, I'd like to access this data via a YAML catalog; for example (actpol.yaml):

sources: 
    ACTPol_148_D6_PA1_S1_1way_I:
        driver: fits_array
        cache:
          - argkey: urlpath
            type: file
        args:
            url: https://lambda.gsfc.nasa.gov/data/suborbital/ACT/actpol_2016_maps/ACTPol_148_D6_PA1_S1_1way_I.fits
            ext: 0
        direct_access: force

I then define the catalog

cat = intake.open_catalog('actpol.yaml')

and I can read in the data array, e.g.,

arr = cat.ACTPol_148_D6_PA1_S1_1way_I.read()

but how do I access the header or wcs? The .arr attribute (as well as header, wcs, etc., are not set for this remote source after read, unlike the local one. Am I not understanding about how remote (or catalog-defined) sources work, or is this a bug in intake-astro?

timothydmorton commented 5 years ago

Also (though somewhat unrelatedly), caching doesn't seem to work for me for this example either. Nothing is written to .intake/cache, and subsequent reads are not fast.

martindurant commented 5 years ago

You need

source = cat.ACTPol_148_D6_PA1_S1_1way_I()
arr = source.read()
source.wcs

(note that you will also have source.header, if you need other keys)

martindurant commented 5 years ago

caching doesn't seem to work

Everyone has been confused by this, and I am trying to improve the situation. For your particular situation, if could be because of writing urlpath instead of url, which is the actual name of the argument.

timothydmorton commented 5 years ago

Thanks! For the cache issue, I tried changing to url instead of urlpath and that didn't seem to change anything.

martindurant commented 5 years ago

Ah, actually intake-astro has simply not been setup to use caching, as it predates it. I don't know when I would get around to adding that; honestly, it should not be up to the data source code to know about caching and how to use it.

timothydmorton commented 5 years ago

I'm happy to take a crack at it if you can give me some pointers to what to model on.

martindurant commented 5 years ago

The following seems to do the trick for the array source, and would be something very similar for table

--- a/intake_astro/array.py
+++ b/intake_astro/array.py
@@ -52,11 +52,12 @@ class FITSArraySource(DataSource):
         from dask.bytes import open_files
         import dask.array as da
         from dask.base import tokenize
+        url = self._get_cache(self.url)[0]
         if self.arr is None:
-            self.files = open_files(self.url, **self.storage_options)
+            self.files = open_files(url, **self.storage_options)
             self.header, self.dtype, self.shape, self.wcs = _get_header(
                 self.files[0], self.ext)
-            name = 'fits-array-' + tokenize(self.url, self.chunks, self.ext)
+            name = 'fits-array-' + tokenize(url, self.chunks, self.ext)
             ch = self.chunks if self.chunks is not None else self.shape
             chunks = []
             for c, s in zip(ch, self.shape):
martindurant commented 4 years ago

It has been a while... You can now get local caching of files using the "filecache:" or "blockcache:" protocols from fsspec without any change to this code : https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally