mi2-warsaw / FSelectorRcpp

Rcpp (free of Java/Weka) implementation of FSelector entropy-based feature selection algorithms with a sparse matrix support
http://mi2-warsaw.github.io/FSelectorRcpp/
35 stars 15 forks source link

Filter request: Relief #78

Closed pat-s closed 4 years ago

pat-s commented 5 years ago

It would be great if the Relief filter from FSelector is available as a C++ implementation. We would like to use it in mlr and mlr3.

cross-ref https://github.com/mlr-org/mlr3featsel/issues/38

zzawadz commented 5 years ago

@krzyslom can you look at this? We can start by porting this as is, and then gradually replacing parts with the C++ code.

krzyslom commented 5 years ago

@zzawadz I'll investigate it next weekend.

pat-s commented 5 years ago

any update here? :)

krzyslom commented 5 years ago

Hey @pat-s

I took a look at the function a while ago. Here are my thoughts.

  1. Controversial silent behavior e.g.
if (sample.size < 1) {
 warning(paste("Assumed: sample.size = ", sample.size))
 sample.size = 1
 sample_instances_idx = sample(1:instances_count, 1)
}

The function prints passed value while calling it assumed. Next the value is silently changed. This behavior occurs in several steps. Moreover we can pass double or a negative number to sample.size which does not make sense. Perhaps it is better to throw an error is such cases.

  1. There are several not exported functions that also have to be implemented in the package e.g. get.data.frame.from.formula and normalize.min.max.

  2. relief function is composed from several inner functions. Results of their call are assigned to global variables with <<- operator. The question is should we just try to make a copy-paste for this functionality, or should we try to redesign the function with future C++ interface in mind. Unfortunately I am not familiar with C++ so I do not have a good judgement in that case. I'm not sure if @zzawadz would write Rcpp implementation in similar fashion.

@zzawadz please take a look at the topic.

pat-s commented 4 years ago

Sorry for the late follow-up.

I have no detailed knowledge about the algorithm but we would like to use a Rcpp version of it in both {mlr} and {mlr3filters}.

I just searched again and there is currently no other package implementing the RELIEF algorithm in R.

Compared to other filters, the java-based {FSelector} implementation takes really long. It would be a good enhancement if we could port this to {FSelectorRcpp}. This filter method is still able to achieve good results in benchmarks and I really think the effort would be worth it.

Sometimes a rewrite is even easier than porting existing code - but you are the experts here, I have no clue about the internals and know only very few about Rcpp.

@zzawadz Did you have time to look at all of this in the meantime?

MarcinKosinski commented 4 years ago

CC @zzawadz

pat-s commented 4 years ago

bump :)

zzawadz commented 4 years ago

I promise I'll loot at it. I think that I'll start from copying the whole function from FSelector and start replacing its parts with C++ code.

zzawadz commented 4 years ago

My plan is like this: 1) contact author of this function and get approval for adapting his code? (do i need this? it's open source so maybe just mentioning the author will be sufficient - I have spent too much time in BigCo to be sure). 2) Adapt the code to use more FSelectorRcpp's like interfaces. 3) Remove <<- assignment (It's hard for me to reason about this code where it's used). 4) Rewrite hottest parts in cpp?

@pat-s what do you think?

pat-s commented 4 years ago

Sure, whatever works :) You are th C++ expert here 👍

MarcinKosinski commented 4 years ago

I don't think you need to contact the author. Just provide a comment its taken crom FSelecote for now. On the other hand the author of FSelector vanished. The package just have a maintainer that looks after abandoned package and that's not the original author

On Sat, 18 Jul 2020, 15:55 Patrick Schratz, notifications@github.com wrote:

Sure, whatever works :) You are th C++ expert here 👍

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mi2-warsaw/FSelectorRcpp/issues/78#issuecomment-660479257, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTVVTVXDAAH2ON5ZZDEIM3R4GLVJANCNFSM4H2WISUA .

zzawadz commented 4 years ago

@MarcinKosinski @krzyslom can you look at the relief branch? There's the current version. I think the api will stay as is (same approach as we're using in information_gain), so we can release it and then work on the speeding up the internals of relief.

Any more tests will be welcomed ;)

MarcinKosinski commented 4 years ago

I might have some time this weekend!

wt., 21 lip 2020 o 08:53 Zygmunt Zawadzki notifications@github.com napisał(a):

@MarcinKosinski https://github.com/MarcinKosinski @krzyslom https://github.com/krzyslom can you look at the relief branch? There's the current version. I think the api will stay as is (same approach as we're using in information_gain), so we can release it and then work on the speeding up the internals of relief.

Any more tests will be welcomed ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mi2-warsaw/FSelectorRcpp/issues/78#issuecomment-661671534, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTVVTXSBOH4TFX47UA6E2TR4U3PLANCNFSM4H2WISUA .

zzawadz commented 4 years ago

Ok then. I'll schedule the release to Monday. @MarcinKosinski any uncaught errors will be your fault xD

MarcinKosinski commented 4 years ago

Having trouble finding some spot to have a review. Maybe I'll look at that in the evening or tmrw early in the morning.

śr., 29 lip 2020 o 22:01 Zygmunt Zawadzki notifications@github.com napisał(a):

Ok then. I'll schedule the release to Monday. @MarcinKosinski https://github.com/MarcinKosinski any uncaught errors will be your fault xD

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mi2-warsaw/FSelectorRcpp/issues/78#issuecomment-665883568, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTVVTUCB5K4QNJ6QAPBRNLR6B5YPANCNFSM4H2WISUA .

MarcinKosinski commented 4 years ago

Looking now!

niedz., 2 sie 2020 o 12:04 Marcin Kosiński m.p.kosinski@gmail.com napisał(a):

Having trouble finding some spot to have a review. Maybe I'll look at that in the evening or tmrw early in the morning.

śr., 29 lip 2020 o 22:01 Zygmunt Zawadzki notifications@github.com napisał(a):

Ok then. I'll schedule the release to Monday. @MarcinKosinski https://github.com/MarcinKosinski any uncaught errors will be your fault xD

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mi2-warsaw/FSelectorRcpp/issues/78#issuecomment-665883568, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTVVTUCB5K4QNJ6QAPBRNLR6B5YPANCNFSM4H2WISUA .

MarcinKosinski commented 4 years ago

Hey, just created this PR https://github.com/mi2-warsaw/FSelectorRcpp/pull/86/files to check which files were changed. I extended NEWS and added parameters verification. I'm good to merge to have the relief algorithm in the pkg copied from FSelector however I imagine the request here is to get the algorithm in C++ which is beyond my skills

zzawadz commented 4 years ago

@MarcinKosinski I'll be porting this into cpp gradually;)

pat-s commented 4 years ago

Thanks guys!! 🎉 Finally being able to avoid {FSelector}.

pat-s commented 4 years ago

@zzawadz Do you have a rough ETA when this goes to CRAN?

pat-s commented 4 years ago

Maybe @MarcinKosinski knows?

MarcinKosinski commented 4 years ago

Looks like Zygmunt is doing some updates right now :)

On Tue, 6 Oct 2020, 17:28 Patrick Schratz, notifications@github.com wrote:

Maybe @MarcinKosinski https://github.com/MarcinKosinski knows?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mi2-warsaw/FSelectorRcpp/issues/78#issuecomment-704351397, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTVVTWZNMCNJV3M3VTS6FTSJMZRTANCNFSM4H2WISUA .

zzawadz commented 4 years ago

@pat-s it's on CRAN now.

MarcinKosinski commented 4 years ago

Kudos Zygmunt :)!

On Sat, 10 Oct 2020, 20:20 Zygmunt Zawadzki, notifications@github.com wrote:

@pat-s https://github.com/pat-s it's on CRAN now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mi2-warsaw/FSelectorRcpp/issues/78#issuecomment-706590889, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTVVTRJSE5MZ5UIVA42L2LSKCQXVANCNFSM4H2WISUA .