Closed WnP closed 9 years ago
This was a big thing for us––hopefully will improve speed as well.
I'll take a look at this later; there was some issue communication on the Py-StackExchange repo about getting just answer bodies using the API. There's a good chance you've implemented that, but it sounds like a great way to eliminate unnecessary data transfer (the rest of the HTML in the site, as with requests/bs4).
Just dropping this here for my own reference when reviewing––cheers!
I hadn't seen that issue before, but I've read the StackExchange API and Py-StackExchange's source code before implementing this feature, so yes it's implemented indeed ;-)
and yes I think it a more efficient method to deal only with the json API rather than full html requests
@WnP This looks suuuuper clean. Starting testing, hopefully will merge by EOD.
@WnP I dig it, merging.
I will make some small modifications to the way the output is printed myself, just because I think it is easier to implement those changes than communicate them. Very minor, just adding some newlines here and there. Will do new release with those changes.
Also, I notice an anecdotal speed difference... Do you?
Thanks!
@lukasschwab yes the speed difference is anecdotal from client side, let's compare them with this simple script:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from timeit import timeit
import stackexchange
from stackexchange import Sort
import bs4
import requests
import html2text
h = html2text.HTML2Text()
term = 'python flask'
API_KEY = "3GBT2vbKxgh*ati7EBzxGA(("
so = stackexchange.Site(stackexchange.StackOverflow, app_key=API_KEY, impose_throttling=True)
questions = so.search_advanced(
q=term,
sort=Sort.Votes)
question = None
for q in questions:
if 'accepted_answer_id' in q.json:
question = q
break
else:
raise Exception('No question found')
def old_way_query(question):
questionurl = question.json['link']
answerid = question.json['accepted_answer_id']
response = requests.get(questionurl)
soup = bs4.BeautifulSoup(response.text)
# Focuses on the single div with the matching answerid--necessary b/c bs4 is quirky
for answerdiv in soup.find_all('div', attrs={'id': 'answer-' + str(answerid)}):
answertext = h.handle(answerdiv.find('div', attrs={'class': 'post-text'}).prettify())
def new_way_query(question):
answerid = question.json['accepted_answer_id']
questiontext = h.handle(so.question(question.id, body=True).body)
answer = h.handle(so.answer(answerid, body=True).body)
print('old way: %s' % timeit("old_way_query(question)", "from __main__ import question, old_way_query", number=20))
print('new way: %s' % timeit("new_way_query(question)", "from __main__ import question, new_way_query", number=20))
on my laptop using Python 2.7.9
it outputs:
old way: 12.9633069038
new way: 0.572069883347
so in this case (20 executions) it's 22 times faster, the more executions you have the more faster it is
for one execution the difference is really anecdotal
old way: 0.849025964737
new way: 0.543494939804
1.11 times faster ^^
however, these tests are highly dependent on the network connection
answers are now 140 character long in listing and follow by
...
if they are more longlet me know if you think it's a good idea or not