Closed GoogleCodeExporter closed 8 years ago
[deleted comment]
Haha - nice ticket, I lol'd :) But I see your point - still am not planning to
change the format of the source file at the moment. What would make sense in my
opinion are small interface files - enabling interop with PHP, Ruby or whatever
you need, written in the language the interop is necessary for. Same could go
for a REST API we could host on html5sec.org (thinking html5sec.org/api/php,
html5sec.org/api/ruby etc.).
Please let me know if you are interested in setting up what you specifically
need - I'll most probably be glad to host it here or give you necessary commit
privileges.
Cheers,
.mario
Original comment by Mario.He...@googlemail.com
on 21 Jul 2011 at 10:00
Yeah, but the problem is parsing "loosely valid JSON" into JSON is a
<em>FUCKING</em> nightmare (well, to be fair, parsing JSON is a nightmare for
the aforemetioned anal reasons). And even more so when the *things* inside
single-quoted-strings are invalid HTML that is supposed to break parsers..
Well, I suppose you write your files directly into
'not-JSON-but-the-thing-we-all-suppose-to-be-JSON', and not from a more
strictly structured source where the change would be easy.
Then yes, if my work could be of some use for the community, I'd be glad to
write those regexps from hell to make an adapter from your files to strict JSON
format.
We'll keep in touch, I hope soon.
Rob'
Original comment by rdelauge...@gmail.com
on 21 Jul 2011 at 10:29
I'd just like to second this issue, the file is completely useless in its
current format. It needs rewritten before anything can even be attempted to be
done with it. The good news is that I've documented all of these issues so
they can easily be fixed.
1.) Remove the /* */ comments
2.) Remove the "var items = " at the beginning
3.) Swap the " and ', JSON uses double quotes
4.) Remove the control characters. JSON considers anything < 0x1f as control
characters. This includes things like 0x09 (tab characters)
5.) \xBC notation is not valid, it should be \u00BC. Same for all other "\x.."
patterns.
6.) \' is not valid in JSON. These can safely be replaced with just a single
quote.
7.) There are multiple places where the dictionaries have rogue commas at the
end. It's always the browser section and the IDs of these are 89, 99, 100, and
102.
I'm including a small python script which addresses all of these issues except
the rogue commas. After manually fixing the rogue commas, I was to read in the
file with the built-in JSON parser. I'd like to stress that my script is not
the best solution, but since I am not a committer, this is the best I can do.
Hopefully the maintainers can use the script below to fix up the .json file and
maintain the fixed version. Or, if the current version is valuable to someone,
rename the current file to be a .js file and then use this script to create a
.json file in the build process. That would let people who use Ruby, Python,
Java, C++, PERL, or any other language to use the *real* JSON file while anyone
who wants to use JS can use either one.
#
# This script will simply fix and load the json file
#
import json, re, string
# remove comments, this is JSON, not javascript
data = open('html5security.json').read()
data = re.sub(r'/\*.*?\*/', r'', data)
# remove the newlines so the regex will work properly
data = re.sub(r'\r?\n', '', data)
# strip everything outside the actual JSON data
get_array_only = re.compile(r'.*?(\[.*\]).*', re.MULTILINE)
data = get_array_only.sub(r'\1', data)
# swap ' for " and " for '
data = data.translate(string.maketrans("'\"", "\"'"))
# convert \xFF to \uFF
data = re.sub(r'\\x([0-9a-fA-F]{2})', r'\\u00\1', data)
# remove the control characters
data = re.sub(r'[\x00-\x1f]*', r'', data)
# Json doesn't allow \' (only \")
data = re.sub(r"[^\\]\\'", r"'", data)
# Assuming the commas were fixed, we can now load the file in non-strict mode
j = json.loads(data)
Original comment by JoseLemm...@mail.com
on 28 Sep 2011 at 11:13
Well, I for one admire your courage for trying to regexp your way out of this
problem. I tried to in ruby, but my (nonexistent) skills failed me. For the
record, here is how I finally did it (when I noticed JSON, unlike XML, can have
unicode chars in strings).
Since the js files are valid-js-but-not-valid-JSON, and that they actually
assign variables, I just built an HTML file that loads the js, and I use the
built-in JSON interpreter to convert it, and then copy-paste it into files.
Lacks the automation, but works fine for me. Here is the barebones html file
(works in all browsers but ie, one could replace textContent by innerText to
make it work).
*********************BEGIN HTML FILE******************************
<html>
<head>
<title>Converter</title>
<style>
textarea{
width:800px;
height:200px;
}
</style>
<script type="text/javascript" src="http://html5security.googlecode.com/svn/trunk/items.json"></script>
<script type="text/javascript" src="http://html5security.googlecode.com/svn/trunk/categories.json"></script>
<script type="text/javascript" src="http://html5security.googlecode.com/svn/trunk/payload.json"></script>
<script type="text/javascript">
function convert(){
var i = JSON.stringify(items);
var c = JSON.stringify(categories);
var p = JSON.stringify(payloads);
var divItems = document.getElementById("items");
var divCategories = document.getElementById("categories");
var divPayloads = document.getElementById("payloads");
var d1=document.createElement("textarea");
d1.textContent=i;
divItems.appendChild(d1);
var d2=document.createElement("textarea");
d2.textContent=c;
divCategories.appendChild(d2);
var d3=document.createElement("textarea");
d3.textContent=p;
divPayloads.appendChild(d3);
}
</script>
</head>
<body>
<div onclick="javascript:convert();">Click me!!1!one!</div>
<div id="items"><h1>Items</h1></div>
<div id="categories"><h1>Categories</h1></div>
<div id="payloads"><h1>Payloads</h1></div>
</body>
</html>
***********************END HTML FILE*********************************
PS: Notice what I did? Safely injected a string into HTML... One wonders..
Rob'
Original comment by rdelauge...@gmail.com
on 29 Sep 2011 at 5:12
Format stays as it is. No further requests over the last n>6 months.
Original comment by Mario.He...@googlemail.com
on 26 Jun 2012 at 7:08
Original issue reported on code.google.com by
rdelauge...@gmail.com
on 21 Jul 2011 at 8:58