zmjones / edarf

exploratory data analysis using random forests
MIT License
68 stars 11 forks source link

foreign function call #30

Closed zmjones closed 9 years ago

zmjones commented 9 years ago

in var_est.RandomForest i call R_predictRF_weights which is a C function in party. R CMD check generates a note for this reason and will mean we can't submit to CRAN even if randomForestCI is there and randomForest gets updated as well. if we want to fix this we can either write some C++ that does this given ensemble (i.e. reimplement R_predictRF_weights) but for one tree at a time (might be the best performance) or modify the party code and hope Hothorn puts it in.

flinder commented 9 years ago

sorry

zmjones commented 9 years ago

haha

flinder commented 9 years ago

yeah solved it real quick but then decided that was too easy :)

flinder commented 9 years ago

can't we just use their C-code with a reference of course and compile it ourselfs?

flinder commented 9 years ago

I just looked at the source think I could also reimplement it in C++.

zmjones commented 9 years ago

yea i think either way would be fine. i might email torsten tomorrow, as it seems like providing this would be easy for him, but idk. if you want to start on it that would be fine.

zmjones commented 9 years ago

also we are going to have to pull out the code in party:::newinputs too, as referencing the function like that is not ok i think (according to R CMD check). maybe if we put party in depends it would be, but i'd like to avoid that. another difficulty is that i use the proximity function in those extractor methods, which means we have to import rather than suggest party. if this is to be avoided i think we either have to reimplement it or ditch the extractor and just make the user do it. obviously it is pretty easy.

lastly, i am not sure about this, but idk how we are supposed to handle S3 generic imports. for example in partial_dependence we use predict.randomForest etc. w/o importing them. i think as long as the generic predict is in the namespace (it is because that is in stats) it is automatically available and the CRAN maintainers won't be mad at us but am not sure.

flinder commented 9 years ago

what's the difference between suggestions 2 and 3 in the so answer you referred to above? Solution 3 sounds clean, I'm just not sure if I understand it right: wouldn't we then have to use a foreign function call in the wrapper function?

zmjones commented 9 years ago

well it wouldn't be foreign if we depend on it. i am not sure which is 2 and 3 though

zmjones commented 9 years ago

fyi there is a call to C_get_nodeID in R_predictRF_weights.

int C_get_nodeID(SEXP subtree, SEXP newinputs,
                  double mincriterion, int numobs, int varperm) {
     return(S3get_nodeID(C_get_node(subtree, newinputs, 
            mincriterion, numobs, varperm)));
}

and this S3 (method?)

void S3set_nodeID(SEXP node, int nodeID) {
    INTEGER(VECTOR_ELT(node, S3_NODEID))[0] = nodeID;
}

int S3get_nodeID(SEXP node) {
    return(INTEGER(VECTOR_ELT(node, S3_NODEID))[0]);
}
zmjones commented 9 years ago

and there is also C_get_node in there, which is a big function. i don't think it makes sense to rewrite this. we should either figure out a way to just include the relevant functions in our package or talk to hothorn to see if he will add an option that outputs a tree prediction matrix.

flinder commented 9 years ago

yeah that's why I was looking for other solutions. I think asking him would definitely be worth it, since it would be the cleanest solution. In case that doesn't work out, what negative consequences does depending on party have, besides making the installation bigger and more unwieldy?

zmjones commented 9 years ago

i think that is it really. writing the email now

zmjones commented 9 years ago

still waiting on hothorn to reply

zmjones commented 9 years ago

fixed (sort of) by 5583e860b0f0023d07f715164df318721bdbc8ac