mrabarnett / mrab-regex

Other
434 stars 49 forks source link

bring 'switch(state->charsize)' out of loops. #99

Open mrabarnett opened 10 years ago

mrabarnett commented 10 years ago

Original report by Anonymous.


Why do you want this feature? What is your use-case?

state->charsize is const during matching, and all "switch" are in loops(backtrack, advance), since these functions have heavily duplicate code.

it should be bring out of loops not only for performance but also for maintenance.

What should the syntax or call look like?

-

Do any other regex implementations have something like this?

-

Please provide any additional information below.

-

mrabarnett commented 10 years ago

Original comment by Anonymous.


I'm not sure what you mean. I try not to duplicate code unless there's a measurable benefit in terms of speed, which only really occurs in a tight loop, e.g. in 'match_many_ANY'.

mrabarnett commented 10 years ago

Original comment by Anonymous.


i am reading _regex.c,

duplicate code of "switch (state->charsize)" and "_REV/_IGN" is really ugly ... modern compiler can optimize function with const parameter correctly, i think it's ok to make them together.

mrabarnett commented 7 years ago

Original comment by Serhiy Storchaka (Bitbucket: storchaka, GitHub: storchaka).


See how this is implemented in the stdlib re module. The repeated code is parametrized by macros and moved into separated file, included multiple times with different definitions.

#!c

/* generate 8-bit version */

#define SRE_CHAR Py_UCS1
#define SIZEOF_SRE_CHAR 1
#define SRE(F) sre_ucs1_##F
#include "sre_lib.h"

/* generate 16-bit unicode version */

#define SRE_CHAR Py_UCS2
#define SIZEOF_SRE_CHAR 2
#define SRE(F) sre_ucs2_##F
#include "sre_lib.h"

/* generate 32-bit unicode version */

#define SRE_CHAR Py_UCS4
#define SIZEOF_SRE_CHAR 4
#define SRE(F) sre_ucs4_##F
#include "sre_lib.h"

This allowed to get rid of code duplication (except on the highest level) and switches on codesize in tight loops.