Closed astronasko closed 4 months ago
Sorry for not getting on this sooner I was on leave until recently.
I'm not sure we can do this, the reason we return a np.array is that the hdu is left open in this form which is a really bad idea and leads to memory leaks...
a better way would to internally use something like:
with fits.open(xxxx) as hdu:
# do stuff
# copy image using np.array
that way we know the hdu is closed and not left floating until the program exits (this is especially bad when we multi-process you can get a lot of files stuck open).
I'll try to figure something out in regards to this and make a patch.
Turns out your solution above also removes the inbuilt fits.getdata ability to "guess" the extension.
So I think we will have to go with a solution just to verify on opening the header, and in the case no extnum or extname is given assume it is the primary extension.
So this is my solution for just the load_header function:
def load_header(filename: str,
kind: Union[str, None] = None,
extnum: Optional[int] = None,
extname: Optional[str] = None) -> fits.Header:
"""
Standard way to load fits file header only
:param filename: str, the filename
:param kind: the kind (for error message)
:param extnum: int, the extension number (if not given uses first)
:param extname: the extension name (if not given uses extnum)
:return: FITS header, the header
"""
# deal with no kind
if kind is None:
kind = 'fits file'
# try to load fits file
try:
if extnum is not None:
with fits.open(filename) as hdu:
hdu.verify('silentfix')
raw_hdr = hdu[extnum].header
header = raw_hdr.copy()
elif extname is not None:
with fits.open(filename) as hdu:
hdu.verify('silentfix')
raw_hdr = hdu[extname].header
header = raw_hdr.copy()
else:
with fits.open(filename) as hdu:
hdu.verify('silentfix')
raw_hdr = hdu[0].header
header = raw_hdr.copy()
except Exception as e:
emsg = 'Cannot load {0}. Filename: {1} \n\t{2}: {3}'
eargs = [kind, filename, type(e), str(e)]
raise LblException(emsg.format(*eargs))
return header.copy()
I don't have a file with a bad header, I'd like to test one and make sure the load_fits
function still works (it seems in your function that it should as it crashed only in load_header
), I'll also need to test if the hdu.verify('silentfix')
slows down reading, if it is the case I think you'll have to fix and re-save your files externally.
Okay sorry to spam this issue, but it does seem to be 3 to 4 times slower to open a header:
In [7]: %timeit load_header(path)
9.84 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [8]: %timeit load_header_old(path)
2.89 ms ± 6.37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
where load_header_old
is the original function, so I'll have to make sure there isn't a place we are loading many headers (I don't think this is ever the case), but again I'd like to try on a "bad" file, maybe it will be even slower if it has to fix the header.
@astronasko could you send me a file that has a "bad" header?
Hi Neil,Apologies for the radio silence — I will have very limited internet connection until the end of this week - from there on, I am to tackle the task you assigned to me.
Hi @njcuk9999 ,
I tested it for ~10³ spectra of ~15 different stars, it works like a charm. The HDU verification is expected to take time because all columns need to be checked against the standard (iirc), but I don't think LBL's performance will suffer much. I had a look at io.load_header
calls throughout LBL's source -- it seems to me that you run this function once per spectral file. Taking your benchmark into account, this would correspond to an additional overhead of <10s per thousand spectra.
I now revert to your solution, with a small change (I silentfix in one context manager because you use the latter in all three cases anyway).
Actually this is not quite correct, for other instruments we have to load headers from wave files and blaze files and other files so its probably nearer 5 header calls per object for some instruments - it is not desirable.
Also this benchmark is on a file that did not need fixing - I would still need a file that needing fixing to really know how long it would take.
In fact we actually lose some extra functionality of fits.getdata
and fits.getheader
where it "auto selects" the extension - it would be a real pain to write the auto select feature just for this problem - or indeed go through every line of code for every instrument and make sure we were getting the correct hdu.
I'm actually more tempted to put this down to "bad input data", rather than slow down everyone else for a few files.
Could a solution be to just fix the headers before using LBL? i.e. could you apply this function before adding the data to LBL?
I would hope that if you did the hdu.verify('silentfix')
and then re-save the fits file it would then function in LBL?
If that's the case, I agree with you -- it would be better to fix bad headers beforehand, and not at runtime. Thank you for your attention to this issue.
If you work with FITS files that do not conform with the standard, astropy throws a similar error during tellucleaning:
In this case some of my spectra have an unparsable card (
TNG EXP_METER_B EXP CENTROID
), which causes the fatal error. However, Astropy has some verification procedures that can "fix" the FITS, so that LBL accepts it. This PR aims to implement this.From what I understand, the main IO functions in the project are
load_fits
andload_header
in lbl/core/io.py. They rely on astropy'sfits.getdata
andfits.getheader
respectively; but astropy's verification works only for high-level class objects.The only solution I found so far is to define a common function that loads the HDU first, corrects it and gets wrapped by
load_header
andload_data
:It looks a bit ugly. Additionally, I am not sure if
kind
,extnum
,extname
will be carried over correctly to fits.open -- I dug in astropy's source code but could not figure it out. My tellucleaning works without error, and I am rather inclined to think that the astropy verification this won't affect the science (unless LBL looks for very specific keywords).The global diff is quite convoluted; I suggest to look at the commit diffs one at a time, if you are interested.