google / streamhtmlparser

Automatically exported from code.google.com/p/streamhtmlparser
BSD 3-Clause "New" or "Revised" License
15 stars 9 forks source link

parser javascript quote state problem #1

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. source code:
int main(void){
    unsigned int getchar_ret;
    htmlparser_ctx *parser = htmlparser_new();
    int js_stat = 0;    
    while((getchar_ret = getchar()) != EOF){
        char c = (char)getchar_ret;
        htmlparser_parse_chr(parser, c);
  if (parser->in_js == 1) { 
    putchar(c);
    js_stat = htmlparser_js_state(parser);
        printf("js stat is %d\n",js_stat);
        }       
    else{   
//  putchar(c);
    }
    }
}
2. Input:
<script type="text/javascript"> 
  document.write("<img src='www.example.com' border=0 width=0 height=0>");
</script>
3. The parser state of string"<img src='www.example.com' border=0 width=0 
height=0>" is 2(JSPARSER_STATE_DQ).

What is the expected output? What do you see instead?
The parser state of string 'www.example.com' expect to be 1(JSPARSER_STATE_Q).

What version of the product are you using? On what operating system?

Version:0.1
OS: linux FC5

Please provide any additional information below.

Original issue reported on code.google.com by zbo...@gmail.com on 10 Jan 2011 at 10:15

GoogleCodeExporter commented 9 years ago
If I understand correctly, you are inside a double quoted javascript string 
literal.

We don't really know about document.write or it's meaning, we just it's 
arguments as regular strings.

Original comment by filipe.a...@gmail.com on 2 Apr 2011 at 5:25

GoogleCodeExporter commented 9 years ago
I mean that: why did not the state change from JSPARSER_STATE_DQ to 
JSPARSER_STATE_Q when parse the string 'www.example.com'? I thought I am inside 
a single quoted javascript string literal.

Original comment by zbo...@gmail.com on 2 Apr 2011 at 11:24

GoogleCodeExporter commented 9 years ago
Actually in your example you are still inside a double quoted javascript 
literal. the single quotes are just part of the string.

Original comment by filipe.a...@gmail.com on 29 Feb 2012 at 1:58