konne / RserveCLI2

A fork of RServeCLI
Other
30 stars 21 forks source link

List of NAs is returned as SexpArrayBool instead of SexpArrayDouble #31

Open hebu opened 9 years ago

hebu commented 9 years ago

I faced this issue today: I have an R algorithm that should return a list of doubles. Whenever it returns a list that does contain NAs only, it will be decoded as a SexpArrayBool and .AsDoubles will run into an NotSupportedException.

So, now I am doing this workaround:

public IList<double> GetDoublesVector(string symbolOrExpression)
    {
        try
        {
            return _rConnection.Eval(symbolOrExpression).AsDoubles.ToList();
        }
        catch (NotSupportedException e)
        {
            return ((ICollection<Sexp>) _rConnection.Eval(symbolOrExpression).Values).Select<Sexp, double>(s =>
            {
                if (s.IsNa)
                {
                    return Double.NaN;
                }
                else
                {
                    return s.AsDouble;
                }
            }).ToList();
        }
    }
hebu commented 9 years ago

Is this an issue with RServeCLI2 or an issue with R's SEXPREC type? As RDotNet has similar behaviour (it returns a boolean vector with all values being "true", I suspect R itself.

(If my question seems weird: Sorry, I am new to R and RServe)

Using R 3.1.2

hebu commented 9 years ago

In the meantime I understood R's behaviour when determining the best matching return type of a vector, which explains, why a NA-vector is interpreted as a boolean-vector.

In R.NET, there's a a way to force the vector into a numeric one, using GetSymbol("mySymbol").AsNumeric

I did not find a similar method in RServe, so I am still stuck with this workaround. But maybe I missed something...

SurajGupta commented 9 years ago

hi hebu - can you provide an example of some real data that is not being marshaled into the right .net type? you can wrap it in a dput() and paste the results here.

hebu commented 9 years ago

As I had to learn, it's actually being marshalled into the right type. It's only about NA-vector, like c(NA, NA, NA, NA, NA) for example. R will return it as a boolean vector, as it always guesses the best matching data type. I did not expect that, because I always expected a double vector. You could try something like this:

_rConnection.Eval("test1 <- c(1.0, NA, NA, NA, NA)");
var r1 = _rConnection.Eval("test1").AsDoubles;
_rConnection.Eval("test2 <- c(NA, NA, NA, NA, NA)");
var r2 = _rConnection.Eval("test2").AsDoubles; // will not work

In case that R returns a boolean vector, "AsDoubles" will not work anymore. That's why I had to build that workaround above. I just want to know now, if this is the recommended way to handle that case.

SurajGupta commented 9 years ago

@hebu - Got it. Yes, R stores c(NA,NA) as an array of bool. This is the problem with dynamically typed languages like R, they play fast and lose with types and coders are happy to let the runtime choose types and be blissfully unaware. Personally, I tend to code R in a very controlled and mistrustful manner. For example, if I author a function that I expect to return an array of double and it's possible that someone, somewhere can construct a c(NA,NA,...), then I would explicitly convert any result into a double before returning using as.double. Try this:

x = as.double(c(NA,NA))
str(x)

And I would unit test that function and throw it different scenarios and check that I'm getting an array of double in each of those scenarios.

Remember, .NET is strongly-typed. IMO, it's a better pattern to be exact in what you send to .NET instead of thinking about .NET as R with the same flexability to mix and match types. It's not that you can't do it in .NET, I just feels wrong to me. That said, we could also just implement AsDoubles in SexpArrayBool In fact, there's already an open issue for this: https://github.com/SurajGupta/RserveCLI2/issues/9 and this particular conversion would be relatively easy. Do you want to take a crack at it? If you do, remember to add some unit tests!

I would NOT recommend doing it they way you are doing it. Using throw/try/catch for flow of control is a code smell/bad design. See here: https://stackoverflow.com/questions/1336094/is-it-bad-to-use-try-catch-for-flow-control-in-net

hebu commented 9 years ago

@SurajGupta Thanks for your explanation! Very helpful! You are right, the solution by catching the exception is not very nifty. Meanwhile, I changed my code to this:

        var result = _rConnection.Eval(symbolOrExpression);
        if (result is SexpArrayDouble)
        {
            return result.AsDoubles;
        }
        else
        {
            return result.Values.Select(x =>
            {
                if (x.IsNa)
                {
                    return Double.NaN;
                }
                return x.AsDouble;
            }).ToList();
        }

But l will also try to get my hands on the R code, that I am calling, so I can force the method to return a double vector instead!