How do we check if an obj inherits from one of the classes

jakob-r commented 10 years ago

assertClass checks if x inherits from all of the given classes. How do we check quickly if it inherits of just one of the given classes?

berndbischl commented 10 years ago

This should be doable with CM IMHO.

assertOneOf?

berndbischl commented 10 years ago

Of course we really cannot check for anything else in the signature.....

mllg commented 10 years ago

For checks like checkArg(desc, c("ResampleDesc", "character"))? What if desc is NA or has not exactly one element? I'd prefer the somewhat strangely named assert function for such cases.

assert(checkClass(x, "ResampleDesc"), checkString(x))

berndbischl commented 10 years ago

What if desc is NA or has not exactly one element?

Disregarding the fact that the alternative you posted might be a good way to check this, it does not make sense to ask this here, but not for eg assertClass....

Because the implied semantics would be to do the same as assertClass disjunctively.

berndbischl commented 10 years ago

Here is an example with output for reference:

> x = 5
> assert(checkString(x), checkClass(x, "data.frame"))
Error in assert(checkString(x), checkClass(x, "data.frame")) : 
  Assertion on 'x' failed. One of the following must apply:
 * checkString: Must be a string
 * checkClass: Must have class 'data.frame'

1) I like the error message. This should be used IMHO

2) The mechanism is MUCH too useful to not mention it prominently on the README somewhere!

mllg commented 10 years ago

Okay I'm confused. Can you do your check with assert or are you still missing some functionality? Maybe an example might help me to understand what you really want to do.

jakob-r commented 10 years ago

On another side note: What will be more elegant than the following code?

assert(checkClass(x, "a"), checkClass(x, "b"), checkClass(x, "c"))

mllg commented 10 years ago

I know this is not very elegant. On the other hand I'd like to mention that these checks are usually necessary because of bad programming habits. A function should have a well defined input to prevent such type juggling which is just error prone and unnecessarily reduces performance. I know that there are exceptions and this one might have its reason (people use the REPL and don't want to type some extra chars). This is why I hacked assert but don't promote it.

TL;DR: if you have more than 2 possible classes, you're probably doing it wrong.

berndbischl commented 10 years ago

"If you cant do it with my library you are doing it wrong".

Seriously, arguing like this in the abstract makes no sense at all. This stuff gets only resolved when one looks are realistic use cases.

We have a couple of them in mlr now. These things came out of a long discussion with users / and in the programming team.

berndbischl commented 10 years ago

And hand I'd like to mention that these checks are usually necessary because of bad programming habits. A function should have a well defined input to prevent such type juggling which is just error prone and unnecessarily reduces performance.

And to make this more precise: What you are saying is considered to completely wrong by many people regarding dynamically typed languages as R and python. I have read a couple of long discussions and posts by people - when I looked up arg check material in other languages. The usually say that function should be able to be "polymorph" in the way args are handled - hence type checks are not good at all.

Now obviously I don't agree with these guys that much. Because I wanted and pushed checkArg and checkmate. But we should at least acknowlegde that this is a somewhat "foreign" perspective. And that there are enough cases where multiple types enter a function at the same arg. And that this is not necessarily is a bad style. Like I said, some people would argue it is more the rule than the exception.

Sorry, wall of text.

berndbischl commented 10 years ago

Before this gets misunderstoood, here my tl;dr:

a) I completely agree with Michel that I have seen dozen of times where this "style" was simply bad and wrong - and Jakob's job was also to check this and think about this (in eg mlr)

b) I also know a couple of instances in mlr where this is really convenient and useful and necessary - and I dont consider the alternative with "just one type" better.

mllg commented 10 years ago

"If you cant do it with my library you are doing it wrong".

Why do you put this in quotes? I never said that. Have you only read the TLDR?

And to make this more precise: What you are saying is considered to completely wrong by many people regarding dynamically typed languages as R and python. I have read a couple of long discussions and posts by people - when I looked up arg check material in other languages. The usually say that function should be able to be "polymorph" in the way args are handled - hence type checks are not good at all.

And I say: If you want polymorphism, use OO. But R does not offer a modern OO system, therefore you should not use polymorphism. Python has the advantage that the core functions raise type errors while R just auto-converts and returns something (and they don't have to handle NAs). Thus you need less assertions in Python to get useful error messages.

berndbischl commented 10 years ago

Why do you put this in quotes? I never said that. Have you only read the TLDR?

It was more or less a funny way to make a point.....

And I say: If you want polymorphism, use OO. But R does not offer a modern OO system, therefore you should not use polymorphism.

This has nothing to do with OO? The languages where this was discussed - that I mentioned - are not that much OO. Actually in most OO languages (that I am fluent in) it is the other way around - that type system is static, so like you argued in your post.

(But I guess "polymorphism" is maybe not the perfect term here. I suspected that when I wrote my post. What actually is?)

berndbischl commented 10 years ago

I mean it is pretty demonstrable to proof your point:

Make a suggestion to remove every instance where we use this concept of "multiple types" in mlr. Then look at the resulting client code. Which is of course still writable, but becomes longer. Now think about what you lost (in terms of briefness) and what you gained (in terms of interface) simplicity.

NB: I am not sure of the answer, although I tend to one side. Which is that the "sugar" of multiple types is in some places better. Yes, my "inner formal self" tells me that this is wrong. My "practical side" disagrees, and over the years I have learned to let it win more than lose :)

mllg commented 10 years ago

Make a suggestion to remove every instance where we use this concept of "multiple types" in mlr.

Okay:

FeatSelWrapper.R
32:  assert(checkClass(resampling, "ResampleDesc"), checkClass(resampling, "ResampleInstance"))

TuneWrapper.R
41:  assert(checkClass(resampling, "ResampleDesc"), checkClass(resampling, "ResampleInstance"))

Let ResampleInstance inherit from ResampleDesc or introduce a virtual super class for both.

getFilterValues.R
45:  assert(checkClass(task, "ClassifTask"), checkClass(task, "RegrTask"))

Should be done in S3 / is a bug. Should just check for SupervisedTask and filters should be implemented as tagged S3 objects (exactly like everywhere else, e.g., measures or imputations).

ResampleInstance.R
44:  assert(checkClass(desc, "ResampleDesc"), checkCharacter(desc))

Do this in S3: default is for ResampleDesc, character-method calls NextMethod.

listLearners.R
30:    assert(checkCharacter(obj), checkClass(obj, "SupervisedTask"))

Unnecessary check, already checked using S3.

getBaggingModels.R
12:  assert(checkClass(model, "BaggingModel"), checkClass(model, "OverBaggingModel"))

Let OverBaggingModel inherit from BaggingModel or introduce a virtual super class for both.

listMeasures.R
20:    assert(checkCharacter(obj), checkClass(obj, "SupervisedTask"))

Use S3 / unnecessary check / see listLearners.

downsample.R
21:  assert(checkClass(obj, "SupervisedTask"), checkClass(obj, "ResampleInstance"))

Unnecessary check, already checked using S3.

CostSensTask.R
26:  assert(checkMatrix(task$env$costs, any.missing = FALSE), checkDataFrame(task$env$costs, any.missing = FALSE))

Not really required, data.frames are converted in a if(is.data.frame(...)) block anyway.

mllg commented 10 years ago

Oh, I forgot to mention that this is still the same client code.

Can I close now? I think this is not getting anywhere.

berndbischl commented 10 years ago

Please leave it open a bit so we can at least extract the mlr suggestions. I also would like to respond (which I cannot right now).

I also disagree that this does not help us to "move forward"....

berndbischl commented 10 years ago

Ok, 2 very brief reponses:

a) Regarding the "let A inherit from B": This would be bad design in some cases (Resampling), good in others (Bagging, already on my list. NB: the new class is like 2 days old). Virtual super class might be OK / actually good in some cases.

b) We have many instances of where you say: "checked by S3 already". We need to formally discuss / define how to do this! Now. Because it happens a lot.

Here is a pro of the approach to use CM there: We see in the error message what classes are actually supported and allowed.

Con: The check needs to be adopted when you add another method for another class. A client programmer could not even do that at all, as he has no access to the generic.

My conclusion from b): Omit the check and live with the bad error message?

mllg commented 10 years ago

Omit the check and live with the bad error message?

Yes. Other options are not viable w.r.t. maintainability.

IMHO you just need to introduce some virtual super classes and remove class checks in S3 methods to be fine.

I will document assert in README now.

mllg / checkmate

How do we check if an obj inherits from one of the classes #31