icy / google-group-crawler

[Deprecated] Get (almost) original messages from google group archives. Your data is yours.
215 stars 38 forks source link

ajax-crawling #2

Closed tinku99 closed 9 years ago

tinku99 commented 9 years ago

thanks for making this. but isn't this using new technology ? : https://developers.google.com/webmasters/ajax-crawling/docs/specification

icy commented 9 years ago

Hi @tinku99,

I think the answer is positive. In my script, I had to use _escaped_fragment_ to download data from Google. Basically, my script benefits from the fact that Google follows the specification :) You will see how a group is organized here [1].

My script is written in #bash, and it uses some known tools (lynx, wget,...) to download data. I believe someone can write and/or improve it by rewritting it in Python, Ruby bla bla. For me #bash is just enough.

[1] https://github.com/icy/google-group-crawler/blob/master/craw.sh#L28

cmpitg commented 9 years ago

[OT] (j/k) As @icy really loves bashing!

icy commented 9 years ago

@cmpitg ;) I like the idea of using pipe (|) to glue small things. Let's see how https://github.com/matz/streem would help ^^

cmpitg commented 9 years ago

:+1: