westes / flex

The Fast Lexical Analyzer - scanner generator for lexing in C and C++
Other
3.64k stars 537 forks source link

character } falls under {characters} rule on IBM z/OS #586

Open alexgubanow opened 1 year ago

alexgubanow commented 1 year ago

Im working on a port of Flex v2.6.4 to IBM z/OS. During testing found that } slips into {characters} rule. Currently workaround is to have rule code like: if(yytext == '}' ) {return '}'; } else { /* main logic*/} i do have a warning: scanner.l:320: warning, rule cannot be matched Application compiled with EBCDIC charset, it is different from ASCII. But such a problem only observed with }, while { character works fine. Does any one has idea what / where / how to ??

Part of scanner.l:

{characters} {
        if(*((const char *)yytext) != '}')
        {
            /* characters logic */
        }
        else
        {
            return '}';
        }
    }

"[" {
        return '[';
    }

"]" {
        return ']';
    }

"{" {
        return '{';
    }

"}" {
        return '}';
    }
Mightyjo commented 1 year ago

I've been puzzling over this for a couple of hours.

I have three questions for you. I beg your pardon if they are very silly.

  1. Do you already have some lex that works on z/OS? Doesn't need to be Flex, just a lex that works in EBCDIC.
  2. Do you have a yacc or bison that works on z/OS?
  3. Did you define the {characters} pattern in the definitions section of your scanner? (If so, what is the definition?)

I have a guess: Your {characters} pattern isn't getting defined the way you expect. I see why you may need it. I'd try having Flex dump your scanner tables and see if your character classes look right. I suspect whichever class includes uppercase alphabetic characters also includes '}' and a bunch of undefined points between I and J.

I think the fix will be in src/parse.y. Near the beginning you'll find the CCL_EXPR macro that assumes isascii() returns true, which it won't. Near the end you'll find the ccl_expr rule that determines what the [:alpha:], etc. classes match. They depend on the CCL_EXPR macro, so they might not be working correctly. The range class definition is just above those and it's almost certainly wrong for EBCDIC, too.

alexgubanow commented 1 year ago

Wow, i did not expected any reaction to this ticket, while you have possibly already found problem :) this is great.

  1. depends from company, everyone has own setup, as ZOS comes only with shell and few other unix utils. RocketSoftware has flex v2.5.4 ported by someone some years ago, nobody knows history where it came from :)
  2. yes, we have bison v3.0.4, came from same person who did flex. As well i have ported bison v3.3.2.
  3. yes, it is defined as characters [a-zA-Z0-9]+[a-zA-Z0-9]* i have tested with \w+ - not working at all, as well with {[a-zA-Z0-9]+[a-zA-Z0-9]*} instead of {characters}- same behaviour

To get flex v2.6.4 compiled, i have used flex v2.5.4 and bison v3.0.4. flex v2.5.4 is used to compile same scanner.l, everything is working fine.

I do have access to sources from where flex v2.5.4 was built, but i have not found any suspicious changes or something worth attention. I can try to compare official src/parse.y from flex v2.5.4 with what we have.

Mightyjo commented 1 year ago

Neat! I've seen mailing list chatter about a patch for EBCDIC during the 2.3 era but I couldn't find the sources.

The ranges like [a-z] are what's causing the problem. They are contiguous in ascii but broken up into 9 character blocks in EBCDIC. If you break them up further into contiguous sequences they should work.

Escape sequences like '\w' are defined similarly, but I forget which file they're in.

alexgubanow commented 1 year ago

okay, we are getting closer: replacing the [a-zA-Z0-9]+[a-zA-Z0-9]* by: [abcdefghijklmnopqrstuvwxyABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789]+[abcdefghijklmnopqrstuvwxyABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789]* solves issue

I have reviewed CCL_EXPR macro in 2.5.4 version, it is different, but copypaste this macro from our v2.5.4 into 2.6.4 did nothing.

Mightyjo commented 1 year ago

Did the warning about rules that can't be matched also go away?

Reading the z/OS 2.5 docs tonight. You probably don't need to worry about the CCL_EXPR macro or the use of functions like isalpha(). Looks the z/OS XL C/C++ library defines them in terms of the current locale (e.g. IBM-1047, ISO8859-1). Even isascii() is available with BSD semantics if you define the _XOPEN_SOURCE macro before including ctype.h.

I don't see isascii() in the z/OS Metal C library reference, but it would just test whether the argument fits in 7 bits. Something like:

int isascii(int c) {
  return ((c & 0xFFFFFF80) == 0);
}

Adjust for sizeof(int), inline, etc.

alexgubanow commented 1 year ago

yes, warning went away. docs for zos is here https://www-40.ibm.com/servers/resourcelink/svc00100.nsf/pages/zOSV2R5Library?OpenDocument Particulalry you are interested in z/OS XL C/C++ Runtime Library Reference https://www-40.ibm.com/servers/resourcelink/svc00100.nsf/pages/zOSV2R5sc147314?OpenDocument I do compile with : -Wc,xplink -D_XPLATFROM_SOURCE=1 -DI370 -D_UNIX03_SOURCE -D_UNIX03_THREADS -D_POSIX_THREADS Also config.h has:

#define _ALL_SOURCE 1
#define _XOPEN_SOURCE 600

This means, isascii() should behave like you normally expect.

metalC is only C, there is no library from IBM, you have to create your own functions, even malloc, etc. There is something called Callable Services, but it is out of this issue scope.