Closed gagarine closed 4 years ago
Hi @gagarine,
Do you know the page range? Google group pagination works on numbers (e.g, 30 posts per page), and that doesn't have anything related to date. If you navigate the group contents to see an approximate number that would help, e.g,
https://groups.google.com/forum/?_escaped_fragment_=forum/archlinuxvn%5B21-40%5D
However, this doesn't really work all the time. Users can post to very old thread, and what we can see from e.g, the link above, is the dates of the last posts; it's not the date of the first posts in the thread. If you really like to work this way I can have a simple patch.
The group soc.culture.soviet
that you mentioned has about 28k topics (the number of messages is bigger of course), and to fetch all these 28k topics that would take few hours (assuming that Google doesn't have any kind of throttle number). I think that's reasonable...
Mmmmh I understand. Using the Google Group web interface, I was using a filter on "first post" and looked between 1991 january to 1991 december. My primary interest is around august so that would be
https://groups.google.com/forum/?_escaped_fragment_=forum/soc.culture.soviet[27630-27720]
Yeah, I saw the import was faster that I tough. I tried a full import but it seem that google killed my connection.
I will play a bit more to see if I can do a full import. Perhaps it easier. Mainly I don't want to not have a message because someone posted on the thread later on.
It's interesting to hear that Google killed your connection :) I am not sure if adding some sleep to the hook can help (https://github.com/icy/google-group-crawler#the-hook)
I will try to have a range support for the script, so that you can specify 27630-27720
as input.
Someone can download 350k messages from a group (https://github.com/icy/google-group-crawler/issues/32), maybe this isn't an issue so far. As I didn't intend to have pagination support, I will close this ticket. Free free to reopen it if there is any better idea to support the feature.
Thanks a lot.
I want to export soc.culture.soviet but it's big... In fact I'm only interested of a five months period. I didn't see a way to export only between a specific period.