saghm / unescape-rs

"Unescapes" strings with escape sequences written with literal characters and converts it into a properly escaped one.
MIT License
11 stars 4 forks source link

failed unescaping #4

Open FreeMasen opened 5 years ago

FreeMasen commented 5 years ago

I am currently trying to use this crate to unescape some javascript strings, however the following examples are failing to unescape.

js literal escaped node's value
'\1\00\400\000' "'\\1\\00\\400\\000'" '\u0001\u0000 0\u0000'
'\'\\"\\\b\f\n\r\t\v\0' "'\\'\\\"\\\\\\b\\f\\n\\r\\t\\v\\0'" '\\"\\\b\f\n\r\t\u000b\u0000

In the above table, the first cell represents a valid js string with escape sequences. The next column represents how that would be represented as a &str in Rust. The third column is the anticipated output, however the unescape function this crate exports returns None for both examples.

saghm commented 5 years ago

Can you clarify which column contains the inputs you're passing in?

FreeMasen commented 5 years ago

The middle column is the input to unescape

saghm commented 5 years ago

Thanks for the info! I think you might have misunderstood the purpose of this crate; the goal of it is to convert a JavaScript string literal (e.g. read in from a source file) and convert it to the equivalent Rust string that it's supposed to represent. I totally understand that this is not at all obvious given the poor naming of the crate and the lack of any documentation; I'm honestly fairly embarrassed about this crate, as I didn't even realize until today that there were other crates published that depended on it. I apologize for the confusion, and I'm going to push up a new version of the crate to add documentation and update the README to explain the purpose more clearly.

FreeMasen commented 5 years ago

I do understand what this crate is attempting to perform. I am in the process of parsing this line and the line below it.

The test I am working on can be found here

and the use of unescaped can be found here

The string literal provided above in the left most column will return None when passed into unescaped while in a JS REPL it will evaluate to the value in the right most column.

FreeMasen commented 5 years ago

To further explain, I put together this minimal example of the failure

saghm commented 5 years ago

Ah, I see. Apologies, I had thought you meant you were passing in the middle column values as raw strings to to unescape.

Right now, unescape only processes unicode numeric escapes of the format \uXXXX. I'm not sure if/when I'd get to adding other types of unicode escapes, but I'd be happy to accept a pull request if this is a blocker for you. I also totally don't mind if you'd prefer to just copy the code into your project or fork the repo.

DarkDust commented 4 years ago

This might be fixed by #7. The failed lines all include legacy octal sequences.