mouse07410 / asn1c

The ASN.1 Compiler
http://lionet.info/asn1c/
BSD 2-Clause "Simplified" License
94 stars 70 forks source link

Double quote is treated as an invalid character in strings #112

Closed ljbade closed 8 months ago

ljbade commented 1 year ago

I discovered this while parsing the OMA LPPe specification which has this definition: OMA-LPPe-Uri ::= VisibleString (FROM ( "a".."z" | "A".."Z" | "0".."9" | ":" | "/" | "?" | "#" | "[" | "]" | "@" | "!" | "$" | "&" | "'" | "(" | ")" | "*" | "+" | "," | ";" | "=" | "-" | "." | "_" | "~" | "%" ))

Which resulted in this error:

ERROR: Symbol '"' at line 487 is prohibited by ASN.1:1994 and ASN.1:1997
ASN.1 grammar parse error near OMA-TS-LPPe-V1_0-20200630-D.asn1:487 (token ""'""): syntax error, unexpected $end
Cannot parse "OMA-TS-LPPe-V1_0-20200630-D.asn1"

I traced the first error to https://github.com/mouse07410/asn1c/blob/vlm_master/libasn1parser/asn1p_l.l#L542

The regex on that line is attempting to match any character not valid in the ASN.1 source file. However in regex \ is a special character, even in inside a character class as explained by https://github.com/westes/flex/blob/master/doc/flex.texi#L888-L894

Properly escaping the \ with \\ fixes the issue and I no longer get the incorrect error message.

This PR also adds some more .gitignore files that git was finding after compiling asn1c.

mouse07410 commented 1 year ago

Why are the tests failing?

ljbade commented 1 year ago

Interesting there must be some difference between lex versions. I will have to figure out why.

mouse07410 commented 1 year ago

Interesting there must be some difference between lex versions. I will have to figure out why.

Lex on MacOS (flex v2.6.4) fails the same way as lex on Ubuntu 20.04.5. Which is why I suspect that your fix is incorrect.

Also, I didn't fully understand the problem. Your quote

OMA-LPPe-Uri ::= VisibleString (FROM ( "a".."z" | "A".."Z" | "0".."9" | ":" | "/" |
                "?" | "#" | "[" | "]" | "@" | "!" | "$" | "&" | "'" | "(" | ")" | "*" | "+" | "," | 
                ";" | "=" | "-" | "." | "_" | "~" | "%" ))

seems to include ' symbol, but not " - so it should be prohibited, shouldn't it?

mouse07410 commented 1 year ago

@ljbade are you interested in getting this PR working and passing the CI?

ljbade commented 1 year ago

Sorry, yes I am. However due to work time commitments it will be a week or two before I get a chance to figure this out. If you want to close it for the moment to keep the PR list tidy then feel free to do so. I can reopen it once I figure out why it failed on the CI but not my local machine.

Alternatively I could make this a draft PR.

mouse07410 commented 1 year ago

. . . due to work time commitments it will be a week or two before I get a chance to figure this out . . .

That's perfectly fine. I'l let this PR wait for you to come back and proceed - hopefully soon. ;-)

mouse07410 commented 1 year ago

@ljbade are you still planning to finish this PR?

attina commented 8 months ago

@ljbade You change is not working on 64bit Linux, flex 2.6.4

make[3]: Entering directory '.../asn1c/libasn1parser'
  LEX      asn1p_l.c
.../asn1c/libasn1parser/asn1p_l.l:543: missing quote
.../asn1c/libasn1parser/asn1p_l.l:544: unrecognized rule
attina commented 8 months ago

@mouse07410 I meet the same problem to compile following code.

OMA-LPPe-Uri ::= VisibleString (FROM ( "a".."z" | "A".."Z" | "0".."9" | ":" | "/" | "?" | "#" | "[" | "]" | "@" | "!" | "$" | "&" | "'" | "(" | ")" | "*" | "+" | "," | ";" | "=" | "-" | "." | "_" | "~" | "%"  ))

I workaround this by change "'" to "\'". I'm not quite sure whether it's a right way.

mouse07410 commented 8 months ago

@attina i don't quite understand what's the replacement. Your post seems to say "replace "'" with "'"" - aka, the same thing. Would you mind explaining what the actual symbols are?

Also, for some reason CI did not start - I want all the changes to pass CI first.

attina commented 8 months ago

@mouse07410 Sorry for the typo, I replaced the "'" with "\'" in the asn code.

attina commented 8 months ago

Or the following change also can make the original asn code pass the compile. But I'm not quite sure I was doing right.

--- a/libasn1parser/asn1p_l.l
+++ b/libasn1parser/asn1p_l.l
@@ -540,7 +540,7 @@ DESCENDANTS         return TOK_DESCENDANTS;

 [(){},;:|!.&@\[\]^]    return yytext[0];

-[^A-Za-z0-9:=,{}<.@()[]'\"|&^*;!-] {
+[^A-Za-z0-9:=,{}<.@()[]'\\\"|&^*;!-] {
                if(TYPE_LIFETIME(1994, 0))
                        fprintf(stderr, "ERROR: ");
                fprintf(stderr,
(END)
attina commented 8 months ago

@mouse07410 Here is my change I believe can fix this problem. 5563983 ] and \ need escape in the regex pattern

mouse07410 commented 8 months ago

@attina any idea why ] may need being escaped???

And why your commit 5563983 stopped escaping "?

mouse07410 commented 8 months ago

I think this problem is resolved by 8ba0fb7, in which favor I'm closing this one.

Please feel free to comment if the 8ba0fb7 fix does not address your problem(s).

attina commented 8 months ago

@mouse07410 According the discussion with regex expert. " don't need escape in class; ] need escape otherwise it will be treated as end of class \ need escape

mouse07410 commented 8 months ago

@attina can you please check the current version to see if it works for you?