sdss / lvmdrp

Local Volume Mapper (LVM) Data Reduction Pipeline
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Metadata caching runs slow (specially at Utah) #123

Closed ajmejia closed 6 days ago

ajmejia commented 1 month ago

Caching the header metadata is slow in general, but it gets worse when adding header fix. At Utah this can take ~1min per camera frame.

ajmejia commented 1 week ago

Hi @havok2063, can you take a look into this one? I think I found the bottleneck is around the call of apply_hdrfix here:

https://github.com/sdss/lvmdrp/blob/master/python/lvmdrp/utils/metadata.py#L579

I think we don't need to read the header fixes every time, specially if we already gave a header object.

havok2063 commented 1 week ago

If there are fixes to the header that need to be applied, then it does need to be read in every time we run the reductions. The header fix needs to be applied once at the beginning of the reduction, to the header of the raw sdR file, before the header gets propagated down to the other products.

Because you're also extracting raw sdR header information into the raw_metadata file, which we also use in the pipeline, the header fix needs to be applied here as well.

ndrory commented 1 week ago

Brian, that may be the case (although we could do some smarter caching of values), but we can't have this step take 45s per frame at Utah, while it takes about 0.3s per frame on our laptops. Something in there is just not optimal.

N.

On Sep 4, 2024, at 4:32 PM, Brian Cherinka @.***> wrote:

If there are fixes to the header that need to be applied, then it does need to be read in every time we run the reductions. The header fix needs to be applied once at the beginning of the reduction, to the header of the raw sdR file, before the header gets propagated down to the other products.

Because you're also extracting raw sdR header information into the raw_metadata file, which we also use in the pipeline, the header fix needs to be applied here as well.

— Reply to this email directly, view it on GitHubhttps://github.com/sdss/lvmdrp/issues/123#issuecomment-2330182895, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJXJJ7JCM2XXHXXYNBZ2PUTZU536RAVCNFSM6AAAAABLOVEWYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZQGE4DEOBZGU. You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Niv Drory -- McDonald Observatory / Dept. of Astronomy The University of Texas at Austin Tel: +1 512 471 6197 http://www.as.utexas.edu/~drory

havok2063 commented 1 week ago

Then can you provide more context? Which MJDs are you testing with? Is it the same over every MJD? Do the number of raw exposures taken in a given MJD night make a difference? Is it this slow for MJDs that don't have header fix files but a large number of raw frames? This will help determine what kind of solution we might need.

ajmejia commented 1 week ago

This happens with all MJDs that have header fixes. Just to test I turned off the header fixes and the caching of metadata went from ~40s/frame to ~5frames/s.

I think a possible solution would be to change how we apply the header fixes during metadata caching. Instead of reading and resolving header fixes for each camera frame header, we could instead read header fixes for the target MJD, resolve all the camera frames that need fixes to be applied and feed that as input to the metadata caching routine. This way you only read header fixes once per MJD. This is more compatible with the "per MJD" design of the HdrFix files.

We could do something similar when applying header fixes to reductions. I suspect is not critical in the case of single MJD reductions but for long reductions this optimization could make a difference in speed of reductions.