imoscovitz / wittgenstein

Ruleset covering algorithms for transparent machine learning
MIT License
90 stars 24 forks source link

Training and Calibration sets for rules? #21

Open HusamAQ opened 2 years ago

HusamAQ commented 2 years ago

Hi @imoscovitz, thank you for this amazing package!

I want to use it for my thesis and was wondering if there is a way to get the sets that the rules were made on? So for each rule I would have the training/calibration set for this rule?

imoscovitz commented 2 years ago

Thanks Husam -- glad you're finding it useful!

Sure, I think there should be a way to do it. By calibration set, do you mean pruning set, validation set, probability calibration, or something else?

On Thu, Apr 7, 2022 at 4:58 AM HusamAQ @.***> wrote:

Hi @imoscovitz https://github.com/imoscovitz, thank you for this amazing package!

I want to use it for my thesis and was wondering if there is a way to get the sets that the rules were made on? So for each rule I would have the training/calibration set for this rule?

— Reply to this email directly, view it on GitHub https://github.com/imoscovitz/wittgenstein/issues/21, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMVL5TIHMDDJTFU4SCKAGTVD3EUTANCNFSM5SZD7PYQ . You are receiving this because you were mentioned.Message ID: @.***>

HusamAQ commented 2 years ago

Thanks for your reply!

Yes, I am trying to get the training (growset & pruning) for each rule in the final model

imoscovitz commented 2 years ago

Gotcha. To train a ruleset, training each successive rule uses training and pruning data based on all prior rules that have already been trained as part of the ruleset. So, for example, the data used to create rule 4 is based on rules 1-3 have been trained. My understanding is you want the train/prune data for 1, train/prune data for 2 based on 1, train/prune data for 3 based on 1-2, etc.?

On Sat, Apr 9, 2022 at 12:08 PM HusamAQ @.***> wrote:

Thanks for your reply!

Yes, I am trying to get the training (growset & pruning) for each rule in the final model

— Reply to this email directly, view it on GitHub https://github.com/imoscovitz/wittgenstein/issues/21#issuecomment-1094108703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMVL5XQB7G2D4KTXE2TIT3VEHITLANCNFSM5SZD7PYQ . You are receiving this because you were mentioned.Message ID: @.***>

HusamAQ commented 2 years ago

Yes! that is exactly what I am trying to get

imoscovitz commented 2 years ago

There is a verbosity parameter that you can use when you declare your IREP or RIPPER model. You can set verbosity=5, which will give you a bunch of training information that may also be useful to you, but not the actual examples. To get the examples, I'd suggest making a couple of small changes to the code. (You should be able to do this by cloning the repo and importing it locally, or cloning and installing it in editable mode with pip install -e <directory of package that has setup.py>)

The changes you want to make are these: In base_functions.py, there are two functions, grow_rule_cn and prune_rule_cn that are called each time a rule is added. They each take as parameters pos_idx and neg_idx, which represent the indices of your dataset that are being used for training/pruning that rule. (pos_idx is for the positive class examples, neg_idx for the negatives.) At the beginning of each of these two functions, you can add a couple of lines of code to print, write to a file, or however you want to keep track of the datasets.

Let me know how that works for you or if you have any questions!

On Sat, Apr 9, 2022 at 2:35 PM HusamAQ @.***> wrote:

Yes! that is exactly what I am trying to get

— Reply to this email directly, view it on GitHub https://github.com/imoscovitz/wittgenstein/issues/21#issuecomment-1094127900, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMVL5RWLYC52V2DND7PSTTVEHZ2LANCNFSM5SZD7PYQ . You are receiving this because you were mentioned.Message ID: @.***>

imoscovitz commented 1 year ago

Oh, and do you want each of the training and pruning datasets to be separated, or do you only need them together as train+prune dataset?

On Sat, Apr 9, 2022 at 1:38 PM Ilan Moscovitz @.***> wrote:

Gotcha. To train a ruleset, training each successive rule uses training and pruning data based on all prior rules that have already been trained as part of the ruleset. So, for example, the data used to create rule 4 is based on rules 1-3 have been trained. My understanding is you want the train/prune data for 1, train/prune data for 2 based on 1, train/prune data for 3 based on 1-2, etc.?

On Sat, Apr 9, 2022 at 12:08 PM HusamAQ @.***> wrote:

Thanks for your reply!

Yes, I am trying to get the training (growset & pruning) for each rule in the final model

— Reply to this email directly, view it on GitHub https://github.com/imoscovitz/wittgenstein/issues/21#issuecomment-1094108703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMVL5XQB7G2D4KTXE2TIT3VEHITLANCNFSM5SZD7PYQ . You are receiving this because you were mentioned.Message ID: @.***>