mischasan / aho-corasick

A-C implementation in "C". Tight-packed (interleaved) state-transition matrix -- as fast as it gets, as small as it gets.
GNU Lesser General Public License v3.0
147 stars 41 forks source link

the interleave function changed the back reference pointer #36

Closed zxxunkxzhx closed 3 years ago

zxxunkxzhx commented 3 years ago

in following statement, if (tp->back == troot) tp->back = NULL; // simplify tests. tp->back is changed for condition tests purpose, but I am afraid this is not a good idea

if we can assume any node's back reference pointer being always valid since initiated or set, we can benefit from using it without null-checks through out this project. so for here, I would suggest don't change back reference into NULL, instead, when we need checking if back reference is NULL inside this function, we use (tp->back == troot)

thanks

mischasan commented 3 years ago

Unrelated: I hope you don't mind my curiosity, but are you looking at acism for the fun of it, or applying it somewhere. It's currently being used in a spam detection engine (rspamd), and in a cancer research genome search at the U of Queensland ... and the person in charge of the latter turned out to be a friend of a guy who works out at my gym. Small world.

I'd be careful about such a change. I remember a bug with pointers to the root affecting tranv[]. (Yes, it's been a while; can't recall the specific).

On 12/02/2021, zxxunkxzhx notifications@github.com wrote:

in following statement, if (tp->back == troot) tp->back = NULL; // simplify tests. tp->back is changed for condition tests purpose, but I am afraid this is not a good idea

if we can assume any node's back reference pointer being always valid since initiated or set, we can benefit from using it without null-checks through out this project. so for here, I would suggest don't change back reference into NULL, instead, when we need checking if back reference is NULL inside this function, we use (tp->back == troot)

thanks

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/mischasan/aho-corasick/issues/36

-- Engineers think equations approximate reality. Physicists think reality approximates the equations. Mathematicians never make the connection.

mischasan commented 3 years ago

Ok got it: I will have to disagree with you on "if we assume..." The null-checks and !!tp->back are purposeful: they are loop conditions and non-branching ops, and they are faster than ptr-comparisons. If you are concerned about clarity or such, could you suggest the wording of comments to impart some understanding to people who aren't following the details?

Cheers!

On 12/02/2021, zxxunkxzhx notifications@github.com wrote:

in following statement, if (tp->back == troot) tp->back = NULL; // simplify tests. tp->back is changed for condition tests purpose, but I am afraid this is not a good idea

if we can assume any node's back reference pointer being always valid since initiated or set, we can benefit from using it without null-checks through out this project. so for here, I would suggest don't change back reference into NULL, instead, when we need checking if back reference is NULL inside this function, we use (tp->back == troot)

thanks

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/mischasan/aho-corasick/issues/36

-- Engineers think equations approximate reality. Physicists think reality approximates the equations. Mathematicians never make the connection.

zxxunkxzhx commented 3 years ago

just a suggestion, this is not a bug.

thanks^^

mischasan commented 3 years ago

Appreciated.

On 26/02/2021, zxxunkxzhx notifications@github.com wrote:

just a suggestion, this is not a bug.

thanks^^

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/mischasan/aho-corasick/issues/36#issuecomment-786897151

-- Engineers think equations approximate reality. Physicists think reality approximates the equations. Mathematicians never make the connection.

zxxunkxzhx commented 3 years ago

not a bug. no more than a few technique discussions. now close the ticket