Open andrew-stebbing opened 10 years ago
Are you saying that if you use the save_json
and load_json
functions as defined in Example 9-6 together as a pair that the results come back with escaped backslashes and such things? e.g. from running the example as-is, you get this behavior? Or, mixing and matching json.dumps
, json.loads
with these functions produces? I'm not sure how this would be happening, and just want to clarify the question. Here's an example IPython interpreter session that shows sample usage:
In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:import io, json
:
:def save_json(filename, data):
: with io.open('resources/ch09-twittercookbook/{0}.json'.format(filename),
: 'w', encoding='utf-8') as f:
: f.write(unicode(json.dumps(data, ensure_ascii=False)))
:
:def load_json(filename):
: with io.open('resources/ch09-twittercookbook/{0}.json'.format(filename),
: encoding='utf-8') as f:
: return f.read()
:--
In [2]: foo = {"a" : "b"}
In [3]: save_json("foo.json", foo)
In [4]: bar = load_json("foo.json")
In [5]: print bar
{"a": "b"}
In [6]: print json.dumps(bar)
"{\"a\": \"b\"}"
I have the save_json
and load_json
functions here for precisely the reason you touch on - you get the dreaded Unicode errors with Python 2.7 when you're trying to serialize out non-ascii to a file -- hence, the wrappers. It's a messy situation, and you could probably spend the rest of the day reading up on Python 2.7 and the Unicode situation that was addressed with Python 3.x.
Also, bear in mind that there's a difference between a Python object and its JSONified representation on disk. A serialized JSON object is a bona fide "string", so certain values in that serialized representation like the quotes around key names then have to be escaped with backslashes.
Does this help at all?
Matthew,
Thank you very much for your reply and may I take this opportunity to congratulate you on a fantastic book. Very enjoyable and informative.
To the matter in hand - firstly I was using the save_json
and load_json
functions as a pair. Interestingly, I ran everything again this morning inside the iPython notebook and I'm still getting the dreaded Unicode error. (Ah! so that's what it is. I've read a lot about it).
I ran examples 1, 3, 4 and 6 from Chapter 9 in sequence, with absolutely no changes to any of the code except the inclusion of my unique credentials in example 1.
Here's the beginning of the output from example 4 Searching for Tweets:
{
"contributors": null,
"truncated": false,
"text": "Want to have 1 membership (no contracts) to 10 Nashville fitness studios? Pilates, Boot Camp, CrossFit, Cycle, Yoga, & more @fitmixnashville",
"in_reply_to_status_id": null,
"id": 465044875579523072,
"favorite_count": 0,
"source": "<a href=\"http://www.socialoomph.com\" rel=\"nofollow\">SocialOomph</a>",
"retweeted": false,
"coordinates": null,
"entities": {
"symbols": [],
"user_mentions": [
...just as we'd expect. When it get's run through example 6 the dreaded Unicode error occurs.
"[{\"contributors\": null, \"truncated\": false, \"text\": \"\\\"Crossfit is like reverse fight club because the first rule of Crossfit is you never...\\\" \u2013 via @getsecret https://t.co/gHoLLNjme4\", \"in_reply_to_status_id\": null, \"id\": 465045090885328896, \"favorite_count\": 0, \"source\": \"<a href=\\\"http://www.apple.com\\\" rel=\\\"nofollow\\\">iOS</a>\", \"retweeted\": false, \"coordinates\": null, \"entities\": {\"symbols\": [], \"user_mentions\":
I created the virtual environment and imported all the code on 26th April this year so I'm assuming I have the latest versions of everything.
Regards Andrew
This is interesting, and I do want to work with you to figure out what is going on. I am unable to reproduce this at the moment, but I don't doubt that you are getting the results that you say you are.
One point I should make is that, as written, Example 9-6 is standalone in terms of the data that it actually runs through save_json
and load_json
. The references to oauth_login()
and twitter_search
are just function references, so data input/output from previous examples such as Examples 9-3 or 9-4 shouldn't matter.
This code hasn't been updated in a while, so I think you do have the latest version of the code.
Since the Code in Example 9-6 is only pulling back 10 results, I wonder if you could use this code block below to try and reproduce the error and share with me the full output when the error occurs? GitHub will probably truncate it, so we may need to use something like a pastebin to get it all across.
import twitter
import io
import json
def oauth_login():
# XXX: Go to http://twitter.com/apps/new to create an app and get values
# for these credentials that you'll need to provide in place of these
# empty string values that are defined as placeholders.
# See https://dev.twitter.com/docs/auth/oauth for more information
# on Twitter's OAuth implementation.
CONSUMER_KEY = ''
CONSUMER_SECRET = ''
OAUTH_TOKEN = ''
OAUTH_TOKEN_SECRET = ''
auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET,
CONSUMER_KEY, CONSUMER_SECRET)
twitter_api = twitter.Twitter(auth=auth)
return twitter_api
def twitter_search(twitter_api, q, max_results=200, **kw):
# See https://dev.twitter.com/docs/api/1.1/get/search/tweets and
# https://dev.twitter.com/docs/using-search for details on advanced
# search criteria that may be useful for keyword arguments
# See https://dev.twitter.com/docs/api/1.1/get/search/tweets
search_results = twitter_api.search.tweets(q=q, count=100, **kw)
statuses = search_results['statuses']
# Iterate through batches of results by following the cursor until we
# reach the desired number of results, keeping in mind that OAuth users
# can "only" make 180 search queries per 15-minute interval. See
# https://dev.twitter.com/docs/rate-limiting/1.1/limits
# for details. A reasonable number of results is ~1000, although
# that number of results may not exist for all queries.
# Enforce a reasonable limit
max_results = min(1000, max_results)
for _ in range(10): # 10*100 = 1000
try:
next_results = search_results['search_metadata']['next_results']
except KeyError, e: # No more results when next_results doesn't exist
break
# Create a dictionary from next_results, which has the following form:
# ?max_id=313519052523986943&q=NCAA&include_entities=1
kwargs = dict([ kv.split('=')
for kv in next_results[1:].split("&") ])
search_results = twitter_api.search.tweets(**kwargs)
statuses += search_results['statuses']
if len(statuses) > max_results:
break
return statuses
def save_json(filename, data):
with io.open('resources/ch09-twittercookbook/{0}.json'.format(filename),
'w', encoding='utf-8') as f:
f.write(unicode(json.dumps(data, ensure_ascii=False)))
def load_json(filename):
with io.open('resources/ch09-twittercookbook/{0}.json'.format(filename),
encoding='utf-8') as f:
return f.read()
# Sample usage
q = 'CrossFit'
twitter_api = oauth_login()
results = twitter_search(twitter_api, q, max_results=10)
print results # I'll be guaranteed to see the full text of the tweets with this
# But in theory, one of these calls is causing the error?
save_json(q, results)
results = load_json(q)
# Or is it this statement that you are saying is causing the error?
print json.dumps(results, indent=1)
What I'm curious to see is if you are ultimately finding that its the save_json
and load_json
calls that are causing the error, or if it's the print json.dumps(results, indent=1)
statement that is the trigger. If it's the latter, I think I may already know what's going on. If it's the former, it'll be a bit more of a mystery.
GitHub wouldn't let me post all results here so I've sent them to you via the email address listed on the 'Mining the Social Web' web-site.
Any updates on this issue? I went into the same situation as mentioned. Thank you!
@LeiG - What specifically are you running into? UnicodeDecodeError? Can you provide more specifics and/or sample data that causes it?
Hello,
I am having trouble getting the
save_json
function from chapter 9, example 6 to work correctly. I should state for the record that I'm running this code in a virtual environment using python 2.7.1 but following the format from the iPython notebooks as I don't want to be tied into the notebooks forever. If I save, and then load the results of a search I end up with output that's full of backslashesIn the book Learning Python 5th ed by Mark Lutz I found some sample code for writing JSON to a file. So, if our
data
results are to be written toresults.json
the code would be:If I use this to save the
data
but use theload_json
function to retrieve and print it I get what I'd expect:Thus, it appears that it's the
save_json
function that's not working correctly. The codejson.dump(data, fp=open("./{0}.json".format(filename), 'w'), indent=1)
appears to just be an amalgam ofjson.dump
andopen('filename')
I've tried creating a a hybrid function similar to
save_json
but it doesn't work:Using
json.dump
yieldsTypeError: must be unicode, not str
whilst usingjson.dumps
doesn't write anything. I'm fairly new to python so I'm rather struggling here.For the time being I'm using this different version of
save_json
which seems to work find for both
trend
andquery
searches but, as no one else seems to have raised this as an issue, I'm curious as to why it's not working correctly for me.Regards Andrew