this should split the token '4°' into two tokens '4' and '°'
After some debugging, it looks like a problem with the regex. We generate "\B°\b" which is correct of all other splitting, but the '\B' seems to be invalid between '\d' and '°'
I tried to work around this by changing Tokenizer.cpp:50 to:
boost::u32regex re = boost::make_u32regex( std::string( "\\<(\\d+)([[:alpha:]\\p{L}°])\\>" ) );
boost::u32regex re = boost::make_u32regex( std::string( "^(\\d+)([[:alpha:]\\p{L}°])$" ) );
but this did not work either. More investigation is needed.
In lex-spain.txt we have:
this should split the token '4°' into two tokens '4' and '°'
After some debugging, it looks like a problem with the regex. We generate "\B°\b" which is correct of all other splitting, but the '\B' seems to be invalid between '\d' and '°'
I tried to work around this by changing Tokenizer.cpp:50 to:
but this did not work either. More investigation is needed.