Open mrabarnett opened 10 years ago
Original comment by Anonymous.
I'm not sure what you mean. I try not to duplicate code unless there's a measurable benefit in terms of speed, which only really occurs in a tight loop, e.g. in 'match_many_ANY'.
Original comment by Anonymous.
i am reading _regex.c,
duplicate code of "switch (state->charsize)" and "_REV/_IGN" is really ugly ... modern compiler can optimize function with const parameter correctly, i think it's ok to make them together.
Original comment by Serhiy Storchaka (Bitbucket: storchaka, GitHub: storchaka).
See how this is implemented in the stdlib re
module. The repeated code is parametrized by macros and moved into separated file, included multiple times with different definitions.
#!c
/* generate 8-bit version */
#define SRE_CHAR Py_UCS1
#define SIZEOF_SRE_CHAR 1
#define SRE(F) sre_ucs1_##F
#include "sre_lib.h"
/* generate 16-bit unicode version */
#define SRE_CHAR Py_UCS2
#define SIZEOF_SRE_CHAR 2
#define SRE(F) sre_ucs2_##F
#include "sre_lib.h"
/* generate 32-bit unicode version */
#define SRE_CHAR Py_UCS4
#define SIZEOF_SRE_CHAR 4
#define SRE(F) sre_ucs4_##F
#include "sre_lib.h"
This allowed to get rid of code duplication (except on the highest level) and switches on codesize in tight loops.
Original report by Anonymous.
Why do you want this feature? What is your use-case?
state->charsize is const during matching, and all "switch" are in loops(backtrack, advance), since these functions have heavily duplicate code.
it should be bring out of loops not only for performance but also for maintenance.
What should the syntax or call look like?
-
Do any other regex implementations have something like this?
-
Please provide any additional information below.
-