Open syntonym opened 9 years ago
hey @syntonym i'll update the documentation tomorrow with some ideas for the direction. I've just been very busy as of late, but should free up soon. One thing that needs to be done is a scraper that takes a link and extracts the icon, title and maybe a few lines of text that can be used for the summary for each post. I was going to try to build this on the server side with BeautifulSoup and then store the results in the datastore.
I put together the scraper last night ended up using lxml, feel free to poke around the code and let me know if you have any questions.
Any specific reason to choose lxml?
Speed on the Server Side. After googling around lxml's library is based on C and is considered considerably faster than beautifulSoup, which is based on regex. http://stackoverflow.com/questions/4967103/beautifulsoup-and-lxml-html-what-to-prefer
We can also consider doing the parser on the client side with javascript.
My thinking for the process is:
user submits link > posts to server > server scrapes key fields > server stores the post in db > server return the json representation of the link to the view > renders as a new post in the view.
Since it has to go through those steps i wanted to make sure that the server side portion was as fast as possible.
It might be more efficient on the client side. I'm not sure what libraries are available in javascript.
I don't think scraping the links clientside is a good idea, because you either have to a) validate the input, which needs probably the same amount of time as simply scraping or b) trust that the user input is right which opens up security flaws.
What do you think of something like this?
I would suggest websockets for pushing new content to clients. This would turn the process into the following:
I'll definitely look into using websockets. I've been using REST/AJAX for awhile so that's more comfortable for me at least for prototyping. I'll finish constructing the basic functionality of the site (so that it at least starts to work and can switch over).
I'll looking to coding up a websockets module as a Handler for these sort of events. Always good to learn new stuff, thanks for the link.
cool article: http://lostechies.com/chrismissal/2013/08/06/browser-wars-websockets-vs-ajax/
Interesting read! I think it realy depends on the application. Personally I find REST the way to go for most "basic" webstuff but some problems are hard to solve with it or are kinda hacks. For example I always found ajax technics besides simple PUT/GET requests (like long polling or polling continously) hackish. Websockets seem to be a clean solution.
was reading your diagram(github should build actual diagrams into the service, maybe the next app!).
in the step in bold
user submits new link -> posts to server -> pulling every x seconds if link is already up -> after link is scraped successull show it in content
can you elaborate what you mean by that?
I was thinking of something like this:
Server:
#I don't know out of my head what the needs are for PUT
#so i use POST here purpose of showing
@route("/api/put_content", methods=["POST"])
def put():
id = new_id()
content[id] = request.content
@route("/content"):
def give_content():
return(jsonify(content))
Clientside (pseudocode):
<form>
<input></input>
<button onclick:push_new_content>Push new content</button>
</form>
<script type="python">
#I know that python does not work on clientside inbrowser :(
push_new_content():
ajax.post("/api/put_content", form.content)
update_content_on_page():
jquery.get("#content_container").delete_all_children()
for c in content:
jquery.get("#content_container").add_children(c)
while True:
sleep(5)
global content
new_content = ajax.get("/content")
if new_content != content:
content = new_content
update_content_on_page()
</script>
I would love to contribte but I'm not right sure what to code. Could you give a description of what to code or should I just do something and hope it's something you can use?