Closed pllim closed 7 years ago
I have never touched this code base, only updated the docs. I believe @mcara would be better assigned to this issue. Especially since it sounds like a regression.
I think it is important to note that:
The error
os.remove(blctmp_name) OSError(2, 'No such file or directory')
has nothing to do with bitmasks and rather it is due to code trying to delete a file that does not exist. Judging by name blctmp_name
it appears that the code is attempting to delete a temporary file produced by hstcal
(acs_destripe
does run hstcal
isn't it? @pllim you worked on that part of the code).
I checked the repository around May 9, 2017 and did not find any changes to neither acstools
nor stsci.tools
(where bitmasking stuff leaves). The only change to a package that seems relevant to this issue appears to be https://github.com/spacetelescope/hstcal/commit/732d69ce8ae39224bc0c4a58a9fe75586fca97c0 and a couple of other changes earlier changes related to Gen2 CTE.
@jamienoss I think you should re-assign this issue to yourself 😈
For the very least we need to understand how did changes in hstcal
affected acstools
and then we can figure out how to "fix" acstools
if the changes in hstcal
are expected and were planned.
I've looked at jc8o04gh2.tra
and it is not really "looping": it is simply accumulating logs since circa 2014.
Begin 13-Dec-2014 04:05:41 EST
Begin 14-Dec-2014 00:20:00 EST
.................................
It seems to me that the essential code in acstools
is here: https://github.com/spacetelescope/acstools/blob/master/acstools/acs_destripe_plus.py#L513 (actually from L512 to L518). Essentially it is like this:
if cte_correct:
acs2d.acs2d(blctmp_name)
os.remove(blctmp_name) # <-- code fails here
It seems to me that acs2d.acs2d(blctmp_name)
is not making the blctmp_name
file and then remove()
is trying to delete something that does not exist.
What has changed in hstcal
or acs2d
that it is not producing the files anymore...
@pllim acs_destripe_plus()
has a clobber
parameter which is False
by default. The regression test is not setting clobber=True
(I do not know why - ask the author of the wfc_destripe_plus
regression test) and therefore acs_destripe_plus()
is appending to the *.tra
instead of overwriting it.
@mcara awesome, thanks for all the detective work. I did wonder if the trailer file was just being appended to since the stone ages rather than having something loop when it shouldn't.
So if it's failing because the *blc_temp.fits is missing this is obviously because it wasn't created and it was probably not created because it is reading in the old (gen1) PCTETAB
file causing it to use the gen1 CTE correction which triggers the deprecation error. I imagine all that needs to be updated is the test to use --ctegen 1
(which doesn't exist c.f. #31) or the PCTETAB
file and reference.
Why isn't the pandokia log or acstools printing the output from hstcal? I'll look into this next week.
Why isn't the pandokia log or acstools printing the output from hstcal?
Because it is printed to trailer files (*.tra
)?
You need special pandokia incantations to capture screen output. I didn't bother to do that in acstools as I did for hstcal. Usually, when something is fixed on hstcal side, acstools is automatically fixed as well. But not this time.
I'm shocked that calacs silently appends to trailer files. The expected behavior is exiting with "output file already exists" error. Or is that only for image outputs? Something to investigate.
BTW, thanks for the investigative work, @mcara !
You are welcome!
I'm shocked that calacs silently appends to trailer files. The expected behavior is exiting with "output file already exists" error. Or is that only for image outputs? Something to investigate.
My feeling is that various other (WFPC2, etc.) calibration tools had treated *.tra
files like astrodrizzle
treats *.log
: they would append new log to the existing file. This definitely needs either further investigation or expert advice of veterans such as @philhodge
@pllim Also, instead of setting clobber=True
you might want to consider deleting existing *.tra
files at the beginning of the test.
On 07/24/2017 03:21 PM, Mihai Cara wrote:
My /feeling/ is that various /other/ (WFPC2, etc.) calibration tools had treated |.tra| files like |astrodrizzle| treats |.log|: they would append new log to the existing file. This definitely needs either further investigation of expert advice of veterans @philhodge https://github.com/philhodge
calcos appends to the trailer file. DMS creates the trailer file, after all; you wouldn't want a calxxx to delete what DMS has written. calstis does not explicitly write to a trailer file; it writes to stdout, and DMS appends stdout to their trailer file.
Yes. Deleting trailer file at the beginning of the test is the solution I would go with. Thanks for the advice. I'll look into that when I return to work if no one get to it first.
Since this test does not print the output to the pandokia log, the only log of what happened is the trailer file and therefore, I would not delete this pre or post run. Doing so would render the tests (such as this one) useless.
How about renaming the trailer file with a name that includes a date/time stamp, at the end of processing?
Since this test does not print the output to the pandokia log, the only log of what happened is the trailer file and therefore, I would not delete this pre or post run. Doing so would render the tests (such as this one) useless.
Not true: if calibrated data files are identical to the reference files, usually we consider that test passes. Normally we do not look at log files such as astrodrizzle.log
(however, if you now consider that log is important - feel free to modify the tests accordingly). I can't imagine what are you going to do with these log files? read them every day? Same goes for @philhodge 's idea of storing trailer file every day! Better let it grow to 10MB...
Of course trailer should not be deleted post-run but removing old log before creating a new one seems OK.
I agree with Mihai. It's not difficult to capture screen output onto pandokia log either. We can do both.
How ever we need to do it doesn't change that it needs to be done and what that is is the persistence of test output. If this becomes volatile as you want then you will have to do the very thing that you suggest sounds 'silly' - looking at the output every day because that's all you will be able to do as that will be its lifetime. If the data persists then this can be looked back at from the future, i.e. not on a daily basis.
The useful situation of a regression test is not to tell you that a test passed but that it failed. The idea of a regression test isn't just to tell you that something regressed but to tell you what and I don't just mean which exe and input file. To do this you would need all the info available to you, i.e. the logs - whatever it is that you would initially see from subsequent dev tests, otherwise, how do you know that you have reproduced the same situation? In the worse case scenario, it may not even be so easily reproduced and the actual output from the initial regressed test may be the only info we have to work with.
I don't think keeping old trailer log is necessary but you do. So I'm trying to compromise here. Anyway, I can't do anything until next week.
The useful situation of a regression test is not to tell you that a test passed but that it failed. The idea of a regression test isn't just to tell you that something regressed but to tell you what and I don't just mean which exe and input file.
Are you just saying this in general or specifically referring to the acs_destripe_plus
test? I fail to see what this has to do with acs_destripe_plus
. Specifically,
acs_destripe_plus
..., the test failed and the log pointed to the line that was trying to remove a file (which did not exist). With a bit of detective work we found the reason. Were you expecting the test to point to the commit that broke the code/test?acs_destripe_plus
is not designed to test calacs
(other tests should test calacs
): calacs
is simply a step in acs_destripe_plus
.jc8o04gh2.tra
and scroll back from the end of the file. Did anything (besides dates) change since the test started failing? Does it tell you anything useful about current failure?I am not saying we should not keep the last trailer and/or other log files. But I fail to see usefulness of keeping all past log files. (To the best of my knowledge) We do not do this for any of the tests and I do not think there is a convincing reason to start doing it now.
@mcara Sorry, I mean a bit of both - in general, but more specifically in this case. I'm talking about us needing some sort of log of what actually happened in the test, e.g. the output from hstcal. This would be better if it were in the pandokia logs such as it is for some of the other tests (hstcal again). Since this is not currently the case for these acstools tests (wfc_destripe_plus.py & wfc_acscte.py specifically) we need to keep the trailer files as this contains that very information. For the above specific tests looking at the trailer files tells you exactly what happened for them to error - it gives you the error.
PCTECORR PERFORM
PCTEFILE /grp/hst/cdbs/jref/xa81724cj_cte.fits
Trying to open /grp/hst/cdbs/jref/xa81724cj_cte.fits...
Read in Primary header from /grp/hst/cdbs/jref/xa81724cj_cte.fits...
(pctecorr) Generation 1 PCTETAB file auto-detected.
ERROR: (pctecorr) Gen1 algorithm detected in PCTETAB (CTE_NAME). Default is gen2, use '--ctegen 1' to override.
ERROR: Error processing jc8o04gh2_blv_tmp.fits.
ERROR: Error processing jc8o04gh2_blv_tmp.fits.
ERROR: Calibration file(s) missing.
I'm not saying that we keep trailer files for all of time just until we have a better replacement. The idea is that if the tests fail but it takes anyone a few days to get around to look at them, more info other than the tests erroring (or failing) needs to be available. The trailer files provide this for these tests but won't if they get deleted. Especially since the email reminder only goes out at 1000 and the next run is 1030 that day - giving you a whopping 30mins to take a look at the trailer files before they are deleted.
Of course trailer should not be deleted post-run but removing old log before creating a new one seems OK.
Removing the old log before the next pre-run is deleting the log post run - only that you are delaying doing so until the next run. It's pretty much the same thing.
What does yesterday's trailer file provides in addition to what today's file provides? Take a look at jc8o04gh2.tra and scroll back from the end of the file. Did anything (besides dates) change since the test started failing? Does it tell you anything useful about current failure?
I think you have misunderstood the critical part of a regression test. The files don't differ from day to day when the tests run, however, they do when something goes wrong and this is the principle of the tests. Yes the tests don't change day to day after they have started to fail but they might - this is the whole point. If we knew in advance how regressions are to be introduced they would be fixed there and then and such tests would not even be needed.
(To the best of my knowledge) We do not do this for any of the tests and I do not think there is a convincing reason to start doing it now.
The convincing reason should be that, currently, these trailer files are the richest source of information relating to what happened during these test.
Look, I think it is OK to treat log files just like any other image file and keep a "reference log file" from a passing test and compare the log of each daily test to that reference file as we already do with fits images [assuming we can skip comparison of date/time stamps]. But storing multiple log files (like what you've been suggesting), is pointless.
I think you have misunderstood the critical part of a regression test. The files don't differ from day to day when the tests run, however, they do when something goes wrong and this is the principle of the tests. Yes the tests don't change day to day after they have started to fail but they might - this is the whole point.
No, I did not. However, I do not see how appending daily trailer files to a file would benefit anyone. Keeping a "good" log for reference is OK but keeping all log files (by appending them to an existing file, or renaming them by date, etc.) - is too much without any gain in information. In fact, the log file of a failing task all by itself should be revealing enough. For example, the trailer file of the failing acs_destripe_plus
shows:
CALACSBEG*** ACSCTE -- Version 9.1.0 (21-Feb-2017) ***
Begin 10-May-2017 10:41:37 EDT
Input jc8o04gh2_blv_tmp.fits
Output jc8o04gh2_blc_tmp.fits
...............
PCTECORR PERFORM
PCTEFILE /grp/hst/cdbs/jref/xa81724cj_cte.fits
Trying to open /grp/hst/cdbs/jref/xa81724cj_cte.fits...
Read in Primary header from /grp/hst/cdbs/jref/xa81724cj_cte.fits...
(pctecorr) Generation 1 PCTETAB file auto-detected.
ERROR: (pctecorr) Gen1 algorithm detected in PCTETAB (CTE_NAME). Default is gen2, use '--ctegen 1' to override.
ERROR: Error processing jc8o04gh2_blv_tmp.fits.
ERROR: Error processing jc8o04gh2_blv_tmp.fits.
ERROR: Calibration file(s) missing.
This seems informative enough in itself.
I have fixed the test. It will now use Gen 2 CTE, as it should be. It will also only retain the log of the last run, as Mihai suggested. It is expected to fail because the "truth" is now outdated. I will open a separate issue when I get the result tomorrow. Thank you for all your inputs!
@pllim I had already done this by swapping the reference files over - check the SVN log. Only leaving the new file to be created and then ok'd.
Please investigate ASAP since ACS Team is fond of this task.
From the log at https://ssb.stsci.edu/pandokia/pandokia.cgi?query=detail&key_id=106065924 , you can't tell much. But when you go in the test machine, you will see that
jc8o04gh2.tra
is a whooping 5.5 MB and the log shows CALACS went into a loop many, many times.Why is this happening? What has changed since it last worked?
c/c @mcara since he wrote the masking stuff