simonsobs / sotodlib

Simons Observatory: Time-Ordered Data processing library.
MIT License
16 stars 19 forks source link

Function to just load metadata from a single ManifestDb entry #908

Open kmharrington opened 4 months ago

kmharrington commented 4 months ago

I'm running into this a lot when we make ManifestDbs with extra axes that end up breaking the "loading data through context.get_meta function. (side note, would love to fix that too) The data in there is still extremely useful for meta analyzes and so I'd just like to load it.

Effectively, given an entry from the list generated by ManifestDb.inspect(), load just that dataset in the "correct" way.

mhasself commented 4 months ago

I know it's not quite "from the list generated by ManifestDb.inspect()", but is https://sotodlib.readthedocs.io/en/latest/context.html#sotodlib.core.metadata.load_metadata good enough for your present purposes?

kmharrington commented 4 months ago

I was running into the issue where the context couldn't load the TOD with the manifest included. I'll test using a context without the preprocessing and see how it works

kmharrington commented 4 months ago

Still has the same issue as others when the ManifestDb isn't fully formed for loading into an AxisManager. Some of the preprocess databases have extra fields that aren't nicely getting added into axes. For example:

meta = ctx.get_meta(obs_id)
test = core.metadata.load_metadata(
    meta, 
    {
        'db':'/global/cfs/cdirs/sobs/users/msilvafe/preprocess/satp3_240712/process_archive.sqlite', 
        'unpack':'preprocess'
    }
)

leads to

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 core.metadata.load_metadata(
      2     meta, 
      3     {
      4         'db':'/global/cfs/cdirs/sobs/users/msilvafe/preprocess/satp3_240712/process_archive.sqlite', 
      5         'unpack':'preprocess'
      6     }
      7 )

File /global/common/software/sobs/perlmutter/conda_envs/soconda_20240517_0.1.3/lib/python3.10/site-packages/sotodlib/core/metadata/loader.py:1150, in load_metadata(tod, spec, unpack)
   1148     request[f'obs:{k}'] = v
   1149 spec = MetadataSpec.from_dict(spec)
-> 1150 item = loader.load_one(spec, request, det_info)
   1151 if not unpack or spec.det_info:
   1152     return item

File /global/common/software/sobs/perlmutter/conda_envs/soconda_20240517_0.1.3/lib/python3.10/site-packages/sotodlib/core/metadata/loader.py:322, in SuperLoader.load_one(self, spec, request, det_info)
    320     result = results[0]
    321 else:
--> 322     result = results[0].concatenate(results)
    323 return result

File /global/common/software/sobs/perlmutter/conda_envs/soconda_20240517_0.1.3/lib/python3.10/site-packages/sotodlib/core/axisman.py:481, in AxisManager.concatenate(items, axis, other_fields)
    479 # Call class-specific concatenation if needed.
    480 if isinstance(keepers[0], AxisManager):
--> 481     new_data[name] = AxisManager.concatenate(
    482         keepers, axis=ax_dim, other_fields=other_fields)
    483 elif isinstance(keepers[0], np.ndarray):
    484     new_data[name] = np.concatenate(keepers, axis=ax_dim)

File /global/common/software/sobs/perlmutter/conda_envs/soconda_20240517_0.1.3/lib/python3.10/site-packages/sotodlib/core/axisman.py:534, in AxisManager.concatenate(items, axis, other_fields)
    532         raise ValueError(err_msg)
    533     elif not np.all([np.array_equal(i[k], items[0][k], equal_nan=True) for i in items]):
--> 534         raise ValueError(err_msg)
    536     output.wrap(k, items[0][k].copy(), axis_map)
    538 elif other_fields == 'fail':

ValueError: The field 'medians' does not share axis 'dets'; medians is not identical across all items pass other_fields='drop' or 'first' or else remove this field from the targets.
kmharrington commented 4 months ago

The way I've found to do this is:

obs_id = 'obs_1719203318_satp3_1111111'
base_dir = '/global/cfs/cdirs/sobs/users/msilvafe/preprocess/satp3_240712/'

proc = core.metadata.ManifestDb(os.path.join(base_dir, 'process_archive.sqlite'))
entry_list = proc.inspect({'obs:obs_id':obs_id})
entry = entry_list[0]
path = os.path.join( base_dir, entry['filename'])
test = core.AxisManager.load(path, entry['dataset'])