Closed singhj closed 10 years ago
Is it okay to use PyCharm instead of Eclipse? I already have PyCharm set up to use my python 2.7 virtualenv and I have installed google app eng and can create and run and app from the IDE. I am having trouble installing the pydev plugin into Eclipse. I am getting some error during the install / download process saying that it can't verify the installation and there is also a NullPointerException.
Whatever works for you as far as the IDE is concerned, Teresa.
Congratulations on getting the app to run from IDE.
Best,
J Singh
President Early Stage IT (617) 475-0120 (O) (978) 760-2055 (M) http://www.datathinks.org http://www.earlystageit.com
Join us at the next Boston Cloud Services Meetuphttp://www.meetup.com/Boston-cloud-services/ .
On Sun, Jan 19, 2014 at 9:27 PM, VaderGirl13 notifications@github.comwrote:
Is it okay to use PyCharm instead of Eclipse? I already have PyCharm set up to use my python 2.7 virtualenv and I have installed google app eng and can create and run and app from the IDE. I am having trouble installing the pydev plugin into Eclipse. I am getting some error during the install / download process saying that it can't verify the installation and there is also a NullPointerException.
— Reply to this email directly or view it on GitHubhttps://github.com/singhj/locality-sensitive-hashing/issues/1#issuecomment-32730327 .
Hi
I going to work on getting our sample twitter api code to run in google app engine. I'm not sure if you anyone gave it another try but I have been doing a little reading and figured these tips maybe helpful:
https://developers.google.com/appengine/docs/python/#Python_Pure_Python
https://github.com/muanis/foursquare-oauth-bootstrap
I'll post some notes once I get this working.
I did give it another try and concluded (perhaps erroneously?) that I didn't have any code to handle the callback from Twitter (callback_URL). So I'm reading up on that and seeing if that's the culprit.
Somewhere I have working code that uses the Facebook oAuth and runs on App Engine. I'll take a look at that if the callback_URL thing yields nothing.
Ahh okay.
I got the script working in a GAE project locally (running via Pycharm). I kept the printing to stdout and it writes to the console in Pycharm. There are a few things I noticed while testing locally.
Note, because I can only see the tweets in the console since I'm writing them to stdout...Mayve this is similar to the behavior you were seeing? I haven't tried pushing and running the code in the cloud yet.
I will push my Pycharm GAE project as another example. I will remove the hardcoded apis keys of course :)
After doing a bit more digging on using twitter's streaming api on google app engine it looks like it isn't supported because GAE's implementation of urlib in the urlfetch api does not support sockets and hence doesn't support persisted connections which the twitters streaming api needs.
I'm thinking that perhaps the reason my GAE project still worked locally perhaps was because it was using my local virtual 2.7 python env's version of urllib?
Since there is no way to poll the public twitter sample stream if we still wanted to use the twitter sample stream as our data source we would need to wrap our calls to the twitter streaming api in a process on another (non GAE box) that dumps the tweets to somewhere (database, small one node Solr instance, Elasticsearch cluster etc)..then we just need to set up some sort of endpoint that our GAE app can hit to get the tweets. We can definitely do polling on GAE, they also offer the Channel API but that only allows for a client to GAE server persisted connection not sure this will work with third party apis that require persisted connections.
If you guys like this idea...I volunteer to set up a micro AWS box to host the end point for our twitter streaming api calls.
Related links...
https://groups.google.com/forum/#!topic/google-appengine/l0FotoLPRso https://groups.google.com/forum/#!topic/google-appengine/CMg6BkhT0_c https://dev.twitter.com/discussions/18339
Sounds like getting real time data in finance. In theory simple, in practice a permanent hassle.
I like the idea of wrapping.
I'm in NH skiing this week. Will be more active next week. G finished setting up my laptop and deployed the fake GAE test app.
--Wolfgang
Sent via the Samsung Galaxy S® III mini, an AT&T 4G LTE smartphone
-------- Original message -------- From: VaderGirl13 notifications@github.com Date:02/17/2014 2:48 PM (GMT-05:00) To: singhj/locality-sensitive-hashing locality-sensitive-hashing@noreply.github.com Subject: Re: [locality-sensitive-hashing] Getting started with a development environment (#1)
After doing a bit more digging on using twitter's streaming api on google app engine it looks like it isn't supported because GAE's implementation of urlib in the urlfetch api does not support sockets and hence doesn't support persisted connections which the twitters streaming api needs.
I'm thinking that perhaps the reason my GAE project still worked locally perhaps was because it was using my local virtual 2.7 python env's version of urllib?
Since there is no way to poll the public twitter sample stream if we still wanted to use the twitter sample stream as our data source we would need to wrap our calls to the twitter streaming api in a process on another (non GAE box) that dumps the tweets to somewhere (database, small one node Solr instance, Elasticsearch cluster etc)..then we just need to set up some sort of endpoint that our GAE app can hit to get the tweets. We can definitely do polling on GAE, they also offer the Channel API but that only allows for a client to GAE server persisted connection not sure this will work with third party apis that require persisted connections.
If you guys like this idea...I volunteer to set up a micro AWS box to host the end point for our twitter streaming api calls.
Related links...
https://groups.google.com/forum/#!topic/google-appengine/l0FotoLPRso https://groups.google.com/forum/#!topic/google-appengine/CMg6BkhT0_c https://dev.twitter.com/discussions/18339
— Reply to this email directly or view it on GitHub.
Indeed. Have fun skiing...I am not very jealous of you :)
If everyone else thinks the wrapping is a reasonable idea. I can get started on that this week. Then move on to stubbing out the pipeline we discussed in our last meeting.
I think if we have to set up a separate box, then GAE is just not right for this framework. We may as well set up django on AWS and use it and totally give up on GAE.
But maybe not.
That SO response from Nick Johnson is old. GAE has changed a lot since the time he was involved with it. I found this post which seems to suggest that people are having some success.
I may get a chunk of time this weekend to try it. And if it doesn't work, then we just bail and don't look back, I think.
Okay sounds good to me. I have looked at that Stackover flow post a few times and there is no indication that the person asking the question was able to get the streaming working. Though the post is asking about the streaming api (using tweepy) if you look at the comments no one has ever tried the streaming api with GAE, even the example given in the link to the git repo was not an example of using the streaming api. There other other features that can be used from both tweepy and twthyon just as polling for user account tweets that will work well with GAE. The last post I saw that said there was no support for persisted HTTP connections was from 2013. Also after looking over the GAE documentation I didn't see anything especially with urlfetch api that mentioned it's support for sockets.
Like I said I had some success with my local account but I think it's because I was using my own environment's version of urlib which supports sockets. It would be nice if we could get this working with GAE since they give us some free stuff.
In that case, let's dump it.
I'll put up a Django instance in AWS -- probably tomorrow -- unless you want to jump on it today.
I can do something on my box or on a new instance. I doubt I'll be able to get to it today but likely tomorrow. I should be able to stub out some stuff for our app as well.
Few questions before getting started:
Gosh, this is becoming an expedition into uncharted technical territory for the simple-minded statistician. But let's go, the more I learn the better.
@ 1: I think regular cronjob to build-up a continuously expanding repository is best.
@2: Elasticsearch (never hears of it) looks interesting to me.
@3: no opinion. What is Django.
Sorry for being so unhelpful. My high time will come when we get to the actual algorithm...
:-)
Sent via the Samsung Galaxy S® III mini, an AT&T 4G LTE smartphone
-------- Original message -------- From: VaderGirl13 notifications@github.com Date:02/18/2014 5:51 PM (GMT-05:00) To: singhj/locality-sensitive-hashing locality-sensitive-hashing@noreply.github.com Cc: wschwerdt wolfgang.schwerdt@gmail.com Subject: Re: [locality-sensitive-hashing] Getting started with a development environment (#1)
I can do something on my box or on a new instance. I doubt I'll be able to get to it today but likely tomorrow. I should be able to stub out some stuff for our app as well.
Few questions before getting started:
What do you want to trigger calling the streaming api? Schedule cron job? RESTful api? I was thinking we could just set up a script that wakes up and gets tweets every so often and dumps them somewhere for later use. Should we store/dump the tweets we get? If so would you be into using Elasticsearch? If we had this set up we could easily test and re-test our application using tweets previously mined from the stream. We could store any other data we want there as well. Just a simple single node cluster for now. If we only need a simple web client. Is it okay to try lighter-weight python web framework? I'm fine with Django as well. — Reply to this email directly or view it on GitHub.
LOL! We need everyone's know how...statistics isn't a simple subject by any means :) I don't think you are being unhelpful. Just want to make sure everyone is okay with what we are preposing.
Thanks for the feedback...as for your questions:
Were you thinking about using Flask as a lighter-weight python web framework alternative to Django?
Thanks, -Scott
On Wed, Feb 19, 2014 at 10:02 AM, VaderGirl13 notifications@github.comwrote:
LOL! We need everyone's know how...statistics isn't a simple subject by any means :) I don't think you are being unhelpful. Just want to make sure everyone is okay with what we are preposing.
Thanks for the feedback...as for your questions:
1.
Elasticsearch - Is a distributed search engine that allows for near real-time indexing. It is very easy to configure, set up and query (though the DSL has a bit of a learning curve). https://github.com/elasticsearch/elasticsearch 2.
Django is a web framework for python.
Reply to this email directly or view it on GitHubhttps://github.com/singhj/locality-sensitive-hashing/issues/1#issuecomment-35507272 .
Wasn't thinking of Flask specifically but that could be one option. I have only used Django (with limited use) but was thinking that it might be overkill for our purposes but nothing wrong with Django.
Have you used Flask or any other python web frameworks?
I haven't personally used Flask, it's been on my radar recently because a coworker showed me it for a project he was working on and I attended a Meetup a few months back where people presented use cases that leveraged Flask. I was also thinking Django might be overkill for what we're doing.
On Wed, Feb 19, 2014 at 10:54 AM, VaderGirl13 notifications@github.comwrote:
Wasn't thinking of Flask specifically but that could be one option. I have only used Django (with limited use) but was thinking that it might be overkill for our purposes but nothing wrong with Django.
Have you used Flask or any other python web frameworks?
Reply to this email directly or view it on GitHubhttps://github.com/singhj/locality-sensitive-hashing/issues/1#issuecomment-35513271 .
Sweet! I J is okay with this trying out Flask I'm down. In the mean time I'll set up the EC2 instance to host the twitter cron job. I'll also start stubbing out some stuff for the library...I'll push to the repo for feedback. Should be able to work on this tonight.
I'm totally OK with Flask.
Yay! Okay cool. I'll send some info tonight on where I am on setting up the box.
Got somewhere!
I was able to fetch tweets from within the Google App Engine environment using Tweepy. The code is a little convoluted at the moment, and has the remnants of the App Engine guestbook application and a whole bunch of stuff we don't need, but it has been checked in.
The instructions are available in the README.
Yay! That's awesome!
Quick question after looking over read_tweepy.py it looks like you are using the tweepy public_timeline() which gives 20 new tweets every 60 seconds according to the api? Do we not want to use the sample firehouse anymore? Tweepy also has a streaming api. If we don't want to use the streaming api I'll hold off on setting up the scripts on my aws box and will focus on making sure I can run your GAE project.
http://pythonhosted.org/tweepy/html/api.html#API.public_timeline http://answers.oreilly.com/topic/2605-how-to-capture-tweets-in-real-time-with-twitters-streaming-api/
@tbrooks007, I think we do want to use the streaming API. I just didn't get that far yesterday. One of the imports in Tweepy was broken and I ended up fetching an older version of streaming.py. Not sure what impact that has. I did end up raising an issue on tweepy and, late last night, the author came back with a suggestion on how to get around it. In other words, what we have in streaming.py
is not consistent with the rest, so getting it to work might not be a slam dunk — sigh.
That O'Reilly article is a great find.
Our idea of having pluggable modules might also extend to the data collection part of the equation and support a GAE version and another one that runs on AWS. But there is another side: we can distract ourselves with all these frameworks and things and never get to the meat of what we're trying to accomplish. What do you think?
It feels like we have some momentum and we have learned a lot in the last few weeks, so why don't I write something up about our vision and how we might be able to accomplish it? And meet next week? Are you going to the meetup? Perhaps we can meet after it ends?
Cool, thanks for checking with tweepy author. Yep I totally agree with getting distracted with frameworks and their nuances. It is easy to get down in the weeds and never get the real project done. We do have momentum and I think writing up something regarding our vision would be awesome. I think that would help a lot. I can't meet next week because I'll be traveling to NYC for work. I will be back Sunday March 3rd. I am available this Sunday though, even if its just for google hang out or skype chat.
I maybe going to the meet up tonight but it really depends on how work goes today. If I can get out of the office on time I'll be there. I'll email you to let you know if I can make later in the day.
Josh (the author of tweepy) reminded me that App Engine now supports sockets. So turning sockets on will help us with streaming anyway.
This Sunday is too soon — I won't have written up my stuff. Let's plan on talking next Sunday (3/1) by Skype or Google Hangout. @tbrooks007, will you be back in town by 3:30 that day?
Scott, we sometimes use email for communication and I don't have yours. LMK please.
I maybe back by 3:30 but I also have to pick up my dog from his daycare/boarding place. I'd say more like 5PM.
I have commitments starting at 5:00. Let's do Tuesday or Wednesday evenings, 3/4 or 3/5.
J Singh
President Early Stage IT (617) 475-0120 (O) (978) 760-2055 (M) http://www.datathinks.org http://www.earlystageit.com
Join us at the next Boston Cloud Services Meetuphttp://www.meetup.com/Boston-cloud-services/ .
On Thu, Feb 20, 2014 at 4:34 PM, VaderGirl13 notifications@github.comwrote:
I maybe back by 3:30 but I also have to pick up my dog from his daycare/boarding place. I'd say more like 5PM.
— Reply to this email directly or view it on GitHubhttps://github.com/singhj/locality-sensitive-hashing/issues/1#issuecomment-35671625 .
Both Monday (3/4) and Tuesday (3/5) work for me.
Did we ever settle on a day and time for the meeting next week?
We didn't settle on a date and time for meeting. Any preference between Tuesday or Wednesday of this week?
Wednesday would be much better for me because I have interviews all day on Tuesday and Wednesday. Patricia
On Mar 3, 2014, at 7:37 AM, singhj wrote:
We didn't settle on a date and time for meeting. Any preference between Tuesday or Wednesday of this week?
— Reply to this email directly or view it on GitHub.
Patricia Voll Larkoski Ph.D. Applied Physics Stanford University
phone: 503-860-3244 patricialarkoski@gmail.com pvoll@alumni.stanford.edu
I can do either day...Wednesday would be good for me though.
Hi everyone, This week I'm getting killed with a bunch of deadlines so would prefer to meet on Monday 3/10 at 7:00 in Davis Square. Does Diesel work for everyone?
Best,
Suits me very well. I am alone with the kids this week and coudl anyway not make it in the evenings.
--Wolfgang
Von: singhj [mailto:notifications@github.com] Gesendet: 05 March 2014 08:47 An: singhj/locality-sensitive-hashing Cc: wschwerdt Betreff: Re: [locality-sensitive-hashing] Getting started with a development environment (#1)
Hi everyone, This week I'm getting killed with a bunch of deadlines so would prefer to meet on Monday 3/10 at 7:00 in Davis Square. Does Diesel work for everyone?
Best,
— Reply to this email directly or view it on GitHub https://github.com/singhj/locality-sensitive-hashing/issues/1#issuecomment-36743123 . https://github.com/notifications/beacon/6421118__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcwOTY0NjQxOSwiZGF0YSI6eyJpZCI6MjM2NzEyNDR9fQ==--784d0f0d94a7b519ce4d28098b3a6d0636a073bb.gif
Hi,
I can't do this upcoming Monday due to a scheduling conflict, but I'd like to know what comes out of the meeting.
Thanks, -Scott
On Wed, Mar 5, 2014 at 9:18 AM, wschwerdt notifications@github.com wrote:
Suits me very well. I am alone with the kids this week and coudl anyway not make it in the evenings.
--Wolfgang
Von: singhj [mailto:notifications@github.com] Gesendet: 05 March 2014 08:47 An: singhj/locality-sensitive-hashing Cc: wschwerdt Betreff: Re: [locality-sensitive-hashing] Getting started with a development environment (#1)
Hi everyone, This week I'm getting killed with a bunch of deadlines so would prefer to meet on Monday 3/10 at 7:00 in Davis Square. Does Diesel work for everyone?
Best,
Reply to this email directly or view it on GitHub < https://github.com/singhj/locality-sensitive-hashing/issues/1#issuecomment-36743123> . < https://github.com/notifications/beacon/6421118__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcwOTY0NjQxOSwiZGF0YSI6eyJpZCI6MjM2NzEyNDR9fQ==--784d0f0d94a7b519ce4d28098b3a6d0636a073bb.gif>
Reply to this email directly or view it on GitHubhttps://github.com/singhj/locality-sensitive-hashing/issues/1#issuecomment-36745968 .
Hi all, I can't make it tonight because it is my second wedding anniversary and I'll be having dinner with my husband. Like Scott, I'd like to know the outcome. Thanks, Patty