trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.19k stars 565 forks source link

Teuchos questions about behavior of AnyNumberParameterEntryValidator #612

Closed kddevin closed 7 years ago

kddevin commented 7 years ago

@trilinos/teuchos

The Teuchos documentation says the following about AnyNumberParameterEntryValidator:

"Standard implementation of a ParameterEntryValidator that accepts numbers from a number of different formats and converts them to numbers in another format."

I wrote a test program to understand the behavior of this validator, and the results of the test did not match my expectations.

  1. I expected it to throw an error if I tried to set a parameter to a non-numeric string, but the set worked without error. E.g.,
    pl.set(parameterName, "bogus", "parameterDoc", anyNumVal); Why doesn't the validator complain about "bogus"?
  2. I also expected that if I did provide a number-like string, I would be able to get actual numbers from getValue. E.g., pl.set(parameterName, "0.5", "parameterDoc", anyNumVal); double dd = pl.getEntry(parameterName).getValue(&dd); In this case, the set works but getValue throws an error, saying the cast failed.

Attached is a program that demonstrates the behavior. To see the thrown errors, some lines need to be un-commented.

Are my expectations (based on the documentation) incorrect? Or is there a bug? Thanks.

paramKDD.cpp.txt

@MicheldeMessieres @krcb

bartlettroscoe commented 7 years ago

@kddevin,

It has been many years since I looked at this stuff at all but I poked around a little to see if there was anything obvious that I could see looking at your example and the source code for this.

First let me apologize for the lack of automated testing for a lot of this code but I wrote it before my "conversion" in 2007-2008 to a a die-hard TDD unit tester. Therefore, we can't just look at the unit tests to determine the correct behavior :-(

For the first case:

pl.set(parameterName, "bogus", "parameterDoc", anyNumVal);

the reason that that does not throw is that it just calls (unchecked) functions std::atoi() and std::atof(). See:

I think at the time I did not have a good way validate such conversions but I think today this could be replaced with Teuchos::asSafe<int>() and Teuchos::asSafe<double>() (provided by @mhoemmen after this code was written).

As for:

double dd = pl.getEntry(parameterName).getValue(&dd);

I would have to look at this case to see if that is supposed to work but from looking at the code, it should since the default preferred type is double.

If you are stuck on this, you might try to replace this with:

double dd = anyNumVal->getDouble(pl.getEntry(parameterName));

and see if that works? You can see this usage in the test/example code at:

However, the primary use case for this class is that the client object expects say double or int but they want to allow the user to set the value as either a double, value, or even a string (if the user is that sloppy) and then it gets converted to the correct type internally. That is really how this class is meant to be used. For that use case, the usage of this class is transparent to both the user and the client. If that is what you are trying to do and that is not working, then this is a defect.

Obviously we need better documentation and examples for this code. If I write some documentation, will you review it?

Otherwise, we just need to add some unit tests for the case of an invalid string and fix this with asSafe<>() calls.

How urgent is this to get fixed (assuming the workarounds don't work)?

krcb commented 7 years ago

Is there a reason that the referenced Teuchos methods don't use std::stoi, std::stod? See e.g.

http://www.cplusplus.com/reference/string/stoi/ http://www.cplusplus.com/reference/string/stod/

These include throwing exceptions for a range of cases which could presumably be wrapped in an appropriate try/catch? One reason could be that it is useful / necessary to be able to extract any number from a string, as demonstrated in the example at:

http://en.cppreference.com/w/cpp/string/byte/atoi

If this is the case, it may be worth implementing another method that only will accept strings that solely contain int/doubles.

mhoemmen commented 7 years ago

Is there a reason that the referenced Teuchos methods don't use std::stoi, std::stod?

Those are C++11 functions. They did not exist at the time. Furthermore, software other than Trilinos uses Teuchos and does not want to require C++11, so Teuchos itself may not have a required C++11 dependency.

krcb commented 7 years ago

Can we wrap some additional capabilities in a HAVE_CXX11 ifdef, much as is being done for the thread-safe RCP work?

kddevin commented 7 years ago

I did a quick check:

Indeed, std::stod works to resolve the problem (1) when used in place of std::atof in the validator, but I understand the c++11 restrictions.

Teuchos::as() does not work, as the argument is const char* and there are no specializations for const char _. Thus, the code does not compile: Trilinos/packages/teuchos/core/src/Teuchosas.hpp:2864:67: required from 'TypeTo Teuchos::asSafe(const TypeFrom&) [with TypeTo = double; TypeFrom = const char]'

Neither of these solutions addresses the second problem, where the conversion isn't done. There, it appears getValue tries to cast rather than convert the characters to double, and the cast throws an error.

Regarding timeframe: We'll need to resolve these problems this month, as the contract for the work is expiring. Allowing users to provide strings is convenient, as then their applications can read parameters from a file and pass them to us without having to interpret and convert them. But if we can't make that work, we'll consider other validators that require ints or doubles.

kddevin commented 7 years ago

Also, regarding the second problem, double dd = anyNumVal->getDouble(pl.getEntry(parameterName)); does, indeed, work, as it calls the conversion in the validator.

However, refactoring all our code to fetch validators and retrieve values through them rather than use getValue<> would be significant work. Can pl.getEntry(parameterName).getValue<> do that work (that is fetch its own validator and call the appropriate function rather than try to just cast the value)?

bartlettroscoe commented 7 years ago

Indeed, std::stod works to resolve the problem (1) when used in place of std::atof in the validator, but I understand the c++11 restrictions.

We can use ifdefs for HAVE_TEUCHOSCORE_CXX11. There are lots of examples in Teuchos. We just need to add formal unit tests for this in Teuchos for the expected behavior then change the code. That should be easy. I am assuming because Zoltan2 needs this then having this require C++11 is fine?

However, refactoring all our code to fetch validators and retrieve values through them rather than use getValue<> would be significant work. Can pl.getEntry(parameterName).getValue<> do that work (that is fetch its own validator and call the appropriate function rather than try to just cast the value)?

Not sure. I will have to look at the details some.

Can someone show me the concrete use case in code that they are trying to get to work (e.g. in a branch)? If that branch is easy to pull and try out (i.e with a failing unit test), then I can likely see how to address the problem. I may have some time on Friday to look at this. Otherwise, I will be in SNL/NM and we can sit down and look at this together?

kddevin commented 7 years ago

In pull request #497 test AllParameters.cpp demonstrates the first problem. It accepts a bad value. The test problem I attached demonstrates both problems (with a bit of uncommenting). It could be the start of a Teuchos test, if you like. I will be around next week and would be happy to chat. Thanks.

kddevin commented 7 years ago
  1. Indeed, we should use the C++11 functions in the validator in place of atof and atoi (with #ifdef for C++11). @bartlettroscoe says we can allow the current behavior to persist for non-C++11.
  2. My expectations of the set() function and the validator were not quite correct. The valid parameter set() determines the type of the parameter, so in Zoltan2 we need to use the proper type there. Then the user can set() a string type and Teuchos will attempt to convert it to the type in the valid parameter set.

3 (new). We may add a bool validator for true/false parameters to Teuchos. @bartlettroscoe says it can go directly in Teuchos in the same file as AnyNumberParameterEntryValidator.

I attach a new test program that demonstrates the bad behavior existing in (1), as well as the correct initializer expected in (2). This program can be the basis of a test program for the corrected validators.

paramKDD.cpp.txt

mhoemmen commented 7 years ago

Indeed, we should use the C++11 functions in the validator in place of atof and atoi (with #ifdef for C++11). @bartlettroscoe says we can allow the current behavior to persist for non-C++11.

I'm OK with that, as long as Teuchos builds and passes tests with C++11 disabled. (I'm mainly fighting for Dakota here, which uses Teuchos and does not require C++11. Life is easier for them if they don't have to add another patch.)

MicheldeMessieres commented 7 years ago

@kddevin I’ve added std::stod and std::stoi for HAVE_TEUCHOSCORE_CXX11. Also added a Teuchos unit test for AnyNumberParameterEntryValidator to test the various settings and highlight the differences in behavior with and without HAVE_TEUCHOSCORE_CXX11. Verified all Teuchos tests pass without HAVE_TEUCHOSCORE_CXX11. I’ve made the changes in our zoltan2ParameterLists branch - I can split up the Teuchos stuff to a separate branch if preferred but for now it seems easiest to work with this together as a batch.

Some std::stod and std::stoi notes: These don’t detect all bad formats - for example: std::stod( “1.1x” ) will read as 1.1 and not throw std::stoi( “1.1” ) will read as 1 and not throw Added notes regarding this to the AnyNumberParameterEntryValidator getDouble and getInt methods. However true bad strings such as “invalid” will properly detect now in the AllParameters unit test.

EnhancedNumberValidator A few parameters are using EnhancedNumberValidator so we can specify min/max ranges. I’ve fixed the bool parameters to have a validator which accepts strings (will update #650 accordingly). But to complete this picture it seems we want to have EnhancedNumberValidator accept strings. Alternatively AnyNumberParameterEntryValidator could have min/max range options. This is probably something we should discuss before proceeding.

paramKDD.cpp example code: I added that example code to AllParameters.cpp as a second test called testForIssue612(). I can remove this on a future update but thought it might be useful as a reference to be sure we are hitting the targets.

mhoemmen commented 7 years ago

From reading the various pull requests, it looks like the plan is to combine these Teuchos changes with the Zoltan2 ParameterList changes; is that right?

MicheldeMessieres commented 7 years ago

@mhoemmen I think whether we do this as a single branch or separate PRs is still TBD. At the moment I've kept it all together for testing convenience.

bartlettroscoe commented 7 years ago

@MicheldeMessieres, it might be nice to have the Teuchos changes on a separate branch so that we can review and approve them independently. I don't think these changes to Teuchos will be breaking backward compatibility in any bad way so I think we can merge the Teuchos changes before the Zoltan2 changes. But if the Teuchos changes will not be fully validated until the Zoltan2 changes are complete, then we might as well keep them together and then just push them at the same time.

Now some specific comments:

I’ve added std::stod and std::stoi for HAVE_TEUCHOSCORE_CXX11. Also added a Teuchos unit test for AnyNumberParameterEntryValidator to test the various settings and highlight the differences in behavior with and without HAVE_TEUCHOSCORE_CXX11. Verified all Teuchos tests pass without HAVE_TEUCHOSCORE_CXX11.

All of that sounds good to me.

std::stod( “1.1x” ) will read as 1.1 and not throw

If that is what std::stod() does, then I guess that is logical behavior? This is an example of std::stod() being more "robust" and less "correct"? See:

std::stoi( “1.1” ) will read as 1 and not throw

Interesting. This is another example of the C++11 standard library being "robust". If the algorithm wants and int (and documents that) then should this truncate or return an error? Note that if this was raw C++code, the compiler would allow it:

int var = 1.1;  // Gets truncated to 1 (with or without warning depending on warning levels)

I tried this with GCC 5.4.0 with -Wall -pendantic and it did not even give a warning about the truncation. (This is one of the behaviors of C that was adopted by C++ that I hate. C/C++ is very sloppy about built-in type conversions.) Based on that I guess this is logical behavior for a C++ standard library function?

But to complete this picture it seems we want to have EnhancedNumberValidator accept strings. Alternatively AnyNumberParameterEntryValidator could have min/max range options. This is probably something we should discuss before proceeding.

Ranges and conversions are orthogonal concepts. That sounds like we need to use the DECORATOR or COMPOSITE design pattern? That way, an algorithm could apply multiple (orthogonal) validators to a single parameter. I think that a COMPOSITE would make sense where you would apply AnyNumberParameterEntryValidator to get the right target type first and then EnhancedNumberValidator to check its range. It should be a pretty easy validator subclass to write. The only complex part would be the conversion of the COMPOSITE validator two and from XML. Does that make sense?

MicheldeMessieres commented 7 years ago

General Status:

Note that I’ve currently got a PR with changes to EnhancedNumberValidator so it will accept strings. (#497) It sounds like changing to a composite pattern will be better. If we want to merge these changes as a single PR, I think the current changes might not be problematic for Teuchos temporarily. Then perhaps as a phase II, we would roll back the EnhancedNumberValidator changes, create the composite class, and make the minor Zoltan2 changes for implementing it. This might be a good way to get the work broken up into smaller chunks.

In summary I think we’ve got the desired behavior we want for Zoltan2 but not the internal implementation in Teuchos.

std::stod and std::stop:

For std::stod and std::stoi I had just noticed now there is an optional idx parameter which can let us convert to more precise behavior - something I wish I’d noted earlier. This will give information about where the number ends.

So we can set up:

size_t idx = 0;

Then call:

double test = std::stoi("1", &idx); // idx is set to 1 double test = std::stoi("1.1", &idx); // idx is set to 1 double test = std::stoi("1x", &idx); // idx is set to 1

double test = std::stod(“1.1", &idx); // idx is set to 3 double test = std::stoi("1.1x", &idx); // idx is set to 3

Then a throw error if (idx != str.length()) will be sufficient to catch those errors and give us the precise behavior we would prefer.

Composite Class

I think the composite idea sounds like it would work well. To validate understanding I expect we would:

Make CompositeClass (name TBD) derived from ParameterEntryValidator. Make CompositeClass a container for both AnyNumberParameterEntryValidator and EnhancedNumberValidator. Then pass through whatever needs to the sent to the base ParameterEntryValidator class and get the XML serialization taken care of.

Perhaps a bit awkward will be handling the templating of EnhancedNumberValidator. AnyNumberParameterEntryValidator handles conversions and is only int or or double type, so I think it makes sense that EnhancedNumberValidator would internally be set up with an appropriate templated type of int or double determined by AnyNumberParameterEntryValidator. So CompositeClass would not be templated.

CompositeClass would work well if it allowed direct communication with the two validators so we don’t have to make copies of all the API calls. However, we would like to keep the two validators in proper synch with each other which might require making them private. This would become more clear once an initial implementation was prototyped.

kddevin commented 7 years ago

Hi @MicheldeMessieres . I don't need this work to turn into a major effort. Other packages seem happy with the validators as is. So if we now have what is needed for Zoltan2, we should probably wrap up this task rather than do extensive refactoring of the validators.

As I understand your comments,

If the above is correct, I suggest making the proposed changes to your use of stod and stoi, and then calling this task completed. There are more important tasks on the to-do list than continued refactoring.

If an important feature is still missing, let me know. Thanks for your efforts.

MicheldeMessieres commented 7 years ago

@kddevin That is all correct. I will make that improvement for the stod and stoi and leave the validators as they are until further notice.

bartlettroscoe commented 7 years ago

I will make that improvement for the stod and stoi and leave the validators as they are until further notice.

I am fine with pushing what is there (after some review) but can we at least agree to create a task to clean this up using a proper validator COMPOSITE subclass as part of the continuing contract work? This is creating duplicate code that represents a maintenance problem going forward and and other problems as well about how to extend this type of software.

MicheldeMessieres commented 7 years ago

@bartlettroscoe I've created issue #686 to keep track of this.

kddevin commented 7 years ago

Pull request #497 addressed this immediate issue. Thus, I will close this issue. @MicheldeMessieres discusses broader issues in #686.