Closed jlconlin closed 8 months ago
It actually does allow for duplicate adjacent values. There's the Material test you pointed out that has duplicate points and it passes that test. I also took the additional unit test you want to add for the TabulationRecord and added it to the current code (without your change in the is_sorted_until) and that test also runs properly.
Now, I'm not saying there isn't an issue.
The following does cause an error:
" 0.000000+0 0.000000+0 33 0 1 49228 1460 438\n"
" 4 4 9228 1460 439\n"
" 1.000000+1 1.725000+1 1.500001+1 1.850000+1 15.0000100 1.975000+09228 1460 440\n"
" 2.700000+1 1.605000+1 9228 1460 441\n";
because the first value is evaluated to 1.5000010000000001E+01 and the second one to 1.5000010000000000E+01 by disco, which technically is not sorted.
Oh, so this is a disco issue, not an ENDF issue.
I just tried your fix with the above ENDF snippet, it still fails.
A solution might be to truncate/round the values coming out of disco. That way, the tail gets cut off and the values become the same again. We'll have to look into this in more detail.
Does this mean we need to sigfig
?
... runs away screaming ...
Seriously now: we need to look into this in more detail, and be sure that truncation is the right way to go.
I don't like the idea of truncation here; I'd rather understand why disco has this behavior than use the "quick fix".
That said, I don't think there is a use case for disco that will use 16 digits of precision, so I can't imagine truncating is a real problem.
On another note, as I've previously discussed privately with @whaeck, ENDFtk's verification checking is too pedantic in general. It's a design philosophy that would be incredibly difficult to overcome at this stage in development, but something to think about as we move forward.
I appreciate that ENDFtk is pedantic when used to create ENDF data.
I do not appreciate that ENDFtk is pedantic when reading existing data from file/string. I'd much rather get a warning along with a data structure that I can query than an error. Not all ENDF data is perfect, but it may still be useable if there weren't a fatal error.
@nathangibson14 I definitely do not want to rush a solution to fix this until we fully understand what's going on. For now, it rarely occurs and can be fixed manually if need be - even though this can become cumbersome in certain situations.
As a side note: if not for ENDFtk's pedantic checking, we wouldn't have known about this - and there's probably something that would have gone wrong down the line.
BTW: I hate floating point comparison
We can fix the comparison to include an epsilon but that doesn't fix the underlying issue. The number will technically still be out of order
I did some digging in disco and was able to recreate this behavior in a simple C++ script. The meaningful pieces are:
int base = 1;
int fraction = 50001;
int exponent = 1;
double a =
( base*pow(10, 5) + fraction ) *
std::pow(10., exponent-5 );
std::cout << a << std::endl;
double b =
( base + fraction * pow(10., -5) ) *
pow(10., exponent);
std::cout << b << std::endl;
a
uses the operation in disco, which gives the undesired value of 1.5000100000000002e+01
. b
is an equally rational way to do it and it gives the desired value of 1.5000100000000000e+01
.
This is definitely a case where the imprecision of floating point arithmetic is at fault. My change is by no means going to absolve all of that imprecision -- it simply works better for the values in this example. I'll do some more testing to see if one option is better than the other more of the time.
So here is an example where this became a problem. I have a perfectly valid PENDF file (attached) for 123>Te, processed using modern RECONR. See line 191 of the file:
0.68256000 158.729179 0.68256000 158.729179 6.825600-1 158.7291815234 3 1
When loading this with Python ENDFtk, I get this error:
[error] X-values are not sorted
[info] X-value [232]: 0.68256
[info] X-value [233]: 0.68256
[info] Error while reading TAB1 ordered pairs
[info] Error encountered while parsing TAB1 record
[info] Encountered error while reading section 1 of file 3 of material 5234
I believe the error comes—as @nathangibson14 has pointed out—because of the conversion of the strings 0.68256000
and 6.825600-1
to floating point numbers. The second string has fewer significant digits and disco doesn't do a good job of making sure that all the extra digits in a floating point number are zeroed out.
I'll have some thoughts on possible changes to disco and whether they could help this issue later today.
But in the meantime, two big questions come to mind--
0.6825600
being written to RECONR? Didn't @whaeck 's updates to disco's write make that default to scientific? The zero at the end means that it doesn't need the extra precision from floating point.I'll have some thoughts on possible changes to disco and whether they could help this issue later today.
But in the meantime, two big questions come to mind--
- What is causing RECONR to put three consecutive points at the same energy? Two of them even have the same cross section!
- Why is
0.6825600
being written to RECONR? Didn't @whaeck 's updates to disco's write make that default to scientific? The zero at the end means that it doesn't need the extra precision from floating point.
I just sent a long email with the conclusion that std::stod
doesn't exhibit this problem, but both alternatives (including what's in disco) have issues. I'm a bit concerned about speed in making a switch to stod
, (mostly because adding e
into strings is a pain) but at least it's a viable option. More work to do. @whaeck has some old code that does this, too.
Regarding item 2, I'm quite sure disco defaults to exponential unless it needs extra digits. I bet the underlying values in that set of three have some rounding that causes the disco write logic to do something a little differently than I'd expect.
The ENDF format allows for duplicate x values in a TAB1 (and other places). There is a
verifyXValuesAreSorted
function called during theTabulationRecord
(i.e., TAB1) constructor. It's a little too strict and doesn't allow adjacent, duplicate x values.