ultrajson / ultrajson

Ultra fast JSON decoder and encoder written in C with Python bindings
https://pypi.org/project/ujson/
Other
4.32k stars 365 forks source link

Unable to decode or load Google Finance JSON #44

Closed thongly closed 12 years ago

thongly commented 12 years ago

URL: 'http://www.google.com/finance/info?infotype=infoquoteall&q=CI:NYSE,CINF:Nasdaq,CTAS:Nasdaq,CSCO:Nasdaq,C:NYSE,CTXS:Nasdaq,CLF:NYSE,CLX:NYSE,KO:NYSE,CCE:NYSE,CTSH:Nasdaq,CL:NYSE,CMCSA:Nasdaq,CMA:NYSE,CSC:NYSE,CAG:NYSE'

I'm relatively new to python, but I believe this may be due to UJSON not being able to escape certain parts of the Google Finance feed.

CJSON is able to decode this using cjson.decode.

Given the benchmarks, I would love to be able to use ujson for this

jskorpan commented 12 years ago

What kind of error do you get?

2012/5/18 thongly < reply@reply.github.com

URL: ' http://www.google.com/finance/info?infotype=infoquoteall&q=CI:NYSE,CINF:Nasdaq,CTAS:Nasdaq,CSCO:Nasdaq,C:NYSE,CTXS:Nasdaq,CLF:NYSE,CLX:NYSE,KO:NYSE,CCE:NYSE,CTSH:Nasdaq,CL:NYSE,CMCSA:Nasdaq,CMA:NYSE,CSC:NYSE,CAG:NYSE '

I'm relatively new to python, but I believe this may be due to UJSON not being able to escape certain parts of the Google Finance feed.

CJSON is able to decode this using cjson.decode.

Given the benchmarks, I would love to be able to use ujson for this


Reply to this email directly or view it on GitHub: https://github.com/esnme/ultrajson/issues/44

Jonas Tärnström Product Manager • e-mail: jonas.tarnstrom@esn.me • skype: full name "Jonas Tärnström" • phone: +46 (0)734 231 552

ESN Social Software AB www.esn.me

mthurlin commented 12 years ago

The data at that URL starts with "\n//", which is not valid JSON. Remove that and it will work.

>>> import ujson, urllib2
>>> json = urllib2.urlopen("http://www.google.com/finance/info?infotype=infoquoteall&q=CI:NYSE,CINF:Nasdaq,CTAS:Nasdaq,CSCO:Nasdaq,C:NYSE,CTXS:Nasdaq,CLF:NYSE,CLX:NYSE,KO:NYSE,CCE:NYSE,CTSH:Nasdaq,CL:NYSE,CMCSA:Nasdaq,CMA:NYSE,CSC:NYSE,CAG:NYSE").read()
>>> json[:5]
'\n// ['
>>> ujson.decode(json)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Expected object or value
>>> ujson.decode(json[3:])
[{u'el': u'43.68', u'eo': u'', u'eccol': u'chg', u'ec': u'+0.25', u'vo': u'0.00', u'eps': u'4.57', u'inst_own': u'87%', u'cp': u'
thongly commented 12 years ago

Even when I look up the " [ " with .find and limit my decode with that, I still get an error.

Traceback (most recent call last): File "<pyshell#32>", line 1, in jsonobj = json.decode(info[z:y]) LookupError: unknown encoding: {

mthurlin commented 12 years ago

And what does repr(info[z:y]) give you?

thongly commented 12 years ago

Specifically, when I try what @mthurlin suggested:

jsonob = json.decode(info[5:])

Traceback (most recent call last): File "<pyshell#33>", line 1, in jsonob = json.decode(info[5:]) LookupError: unknown encoding: { "id": "22144" ,"t" : "AAPL" ,"e" : "NASDAQ" ,"l" : "567.84" ,"l_cur" : "567.84" ,"s": "0" ,"ltt":"11:26AM EDT" ,"lt" : "May 22, 11:26AM EDT" ,"c" : "+6.56" ,"cp" : "1.17" ,"ccol" : "chg" ,"eo" : "" ,"delay": "" ,"op" : "569.55" ,"hi" : "573.88" ,"lo" : "565.50" ,"vo" : "11.53M" ,"avvo" : "23.76M" ,"hi52" : "644.00" ,"lo52" : "310.50" ,"mc" : "530.97B" ,"pe" : "13.84" ,"fwpe" : "" ,"beta" : "1.26" ,"eps" : "41.02" ,"shares" : "935.06M" ,"inst_own" : "69%" ,"name" : "Apple Inc." ,"type" : "Company" } ,{ "id": "284784" ,"t" : "INTC" ,"e" : "NASDAQ" ,"l" : "26.06" ,"l_cur" : "26.06" ,"s": "0" ,"ltt":"11:26AM EDT" ,"lt" : "May 22, 11:26AM EDT" ,"c" : "-0.09" ,"cp" : "-0.35" ,"ccol" : "chr" ,"eo" : "" ,"delay": "" ,"op" : "26.27" ,"hi" : "26.28" ,"lo" : "25.77" ,"vo" : "20.45M" ,"avvo" : "40.23M" ,"hi52" : "29.27" ,"lo52" : "19.16" ,"mc" : "131.10B" ,"pe" : "11.04" ,"fwpe" : "" ,"beta" : "1.07" ,"eps" : "2.36" ,"shares" : "5.03B" ,"inst_own" : "62%" ,"name" : "Intel Corporation" ,"type" : "Company" } ]

mthurlin commented 12 years ago

You are removing the first [. If you look at my example, I did only remove the first three chars.

thongly commented 12 years ago

I get the same thing when I run it with [3:]

jsonobj = json.decode(info[3:])

Traceback (most recent call last): File "<pyshell#34>", line 1, in jsonobj = json.decode(info[3:]) LookupError: unknown encoding: [ { "id": "22144" ,"t" : "AAPL" ,"e" : "NASDAQ" ,"l" : "567.84" ,"l_cur" : "567.84" ,"s": "0" ,"ltt":"11:26AM EDT" ,"lt" : "May 22, 11:26AM EDT" ,"c" : "+6.56" ,"cp" : "1.17" ,"ccol" : "chg" ,"eo" : "" ,"delay": "" ,"op" : "569.55" ,"hi" : "573.88" ,"lo" : "565.50" ,"vo" : "11.53M" ,"avvo" : "23.76M" ,"hi52" : "644.00" ,"lo52" : "310.50" ,"mc" : "530.97B" ,"pe" : "13.84" ,"fwpe" : "" ,"beta" : "1.26" ,"eps" : "41.02" ,"shares" : "935.06M" ,"inst_own" : "69%" ,"name" : "Apple Inc." ,"type" : "Company" } ,{ "id": "284784" ,"t" : "INTC" ,"e" : "NASDAQ" ,"l" : "26.06" ,"l_cur" : "26.06" ,"s": "0" ,"ltt":"11:26AM EDT" ,"lt" : "May 22, 11:26AM EDT" ,"c" : "-0.09" ,"cp" : "-0.35" ,"ccol" : "chr" ,"eo" : "" ,"delay": "" ,"op" : "26.27" ,"hi" : "26.28" ,"lo" : "25.77" ,"vo" : "20.45M" ,"avvo" : "40.23M" ,"hi52" : "29.27" ,"lo52" : "19.16" ,"mc" : "131.10B" ,"pe" : "11.04" ,"fwpe" : "" ,"beta" : "1.07" ,"eps" : "2.36" ,"shares" : "5.03B" ,"inst_own" : "62%" ,"name" : "Intel Corporation" ,"type" : "Company" } ]