quesurifn / yake-rust

MIT License
6 stars 5 forks source link

Likely bug in score normalization #10

Open bunny-therapist opened 3 weeks ago

bunny-therapist commented 3 weeks ago

It looks like the normalization is wrong for the score.

If you look in the YAKE paper, eq. (1), the score should be the S-product divided by (TF + S-sum), but when we look in the code: https://github.com/quesurifn/yake-rust/blob/master/yake_rust/src/lib.rs#L364 we see

 let weight = prod_ / tf * (1.0 + sum_);

Shouldn't this be

 let weight = prod_ / (tf * (1.0 + sum_));

? The way it is now, we divide by tf but then multiply the whole thing by (1.0 + sum_). From the paper (and other implementations), it looks like all of that should be in the denominator?

quesurifn commented 3 weeks ago

I'll take a look. I used the Python source here:

https://github.com/LIAAD/yake/tree/master/yake.

It does look like Rust follows the order of operations: E.g.

let val =  5 + 3 * 2; // evaluates to 11
bunny-therapist commented 3 weeks ago

I am not saying that rust does not follow the order of operations? Of course multiplication comes before plus. But why would it come before division?

The question is if

let val =  10 / 2 * 5

is equal to

let val =  10 / (2 * 5)

I expect the first to become 25, and the latter to become 1. That is at least how it works in other languages. I am not aware of any special rust handling of multiplication and division.

bunny-therapist commented 3 weeks ago

The line in the references LIAAD implementation has a parenthesis: https://github.com/LIAAD/yake/blob/master/yake/datarepresentation.py#L334

self.H = prod_H / ( ( sum_H + 1 ) * tf_used )

quesurifn commented 3 weeks ago

I recall using parentheses and the linter telling me that they were not needed. I think it's because Rust doesn't require parentheses to follow the order of operations where most languages do. I'll double check this though.

bunny-therapist commented 3 weeks ago

I have no idea what you mean by "order of operations" in this case. What I am saying is consistent with everything I know about order of operations in mathematics as well as in other programming languages. I am not aware of any rule saying "carry out multiplication before any division".

Rust playground agrees with me.

Paste

fn main() {
  let val =  10 / 2 * 5;
  println!("{}", val);
  let other_val =  10 / (2 * 5);
  println!("{}", other_val);
}

into https://play.rust-lang.org/?version=stable&mode=debug&edition=2021

bunny-therapist commented 3 weeks ago

image

bunny-therapist commented 3 weeks ago

You can also see in the rust reference that multiplication and division has the same precedence, being then evaluated left to right: https://doc.rust-lang.org/reference/expressions.html#expression-precedence

* / %   left to right
bunny-therapist commented 3 weeks ago

In mathematics (no programming), this is the same. Multiplication and division have the same precedence and are carried out left to right.

All mathematics and all programming languages I know of are in agreement.

https://en.wikipedia.org/wiki/Order_of_operations

quesurifn commented 3 weeks ago

Got it. You weren't explaining yourself well so looking at an equation at 8 am with almost no context isn't going to get us to an agreement.

Are you sure that's the relevant line in the Python source?

self.H = prod_H / ( ( sum_H + 1 ) * tf_used )

If you are feel free to open a PR.

bunny-therapist commented 3 weeks ago

Got it. You weren't explaining yourself well so looking at an equation at 8 am with almost no context isn't going to get us to an agreement.

There is literally no reason for that jab and there is no action I can take to produce better explanations in the future based on it. If you somehow took offense to something, I assure you it was unintentional and I apologize.

If you are feel free to open a PR.

I will.