icy / google-group-crawler

[Deprecated] Get (almost) original messages from google group archives. Your data is yours.
215 stars 38 forks source link

Raw access denied #29

Closed alexivkin closed 4 years ago

alexivkin commented 5 years ago

Private group, crawler works fine. When running the resulting bash no messages are downloaded. When copy-pasting the .../forum/message/raw?msg=... url to the same browser that exported cookies I get "access to groups.google.com was denied". Browsing works fine, messages show up ok in the normal UI, but I noticed that there is no option to "show original" on the drop down next to the message.

I've checked all permissions and there is nothing I could find that would reference the "show original" option.

  1. Is there something I need to configure to get the original message?
  2. If no, how could I change the url to download the message text. It's ok if it does not come in the RFC 822 format, but as an html/text
icy commented 5 years ago

Hi @alexivkin , what kind of that group? The script can't download from Adult content group. This kind of group causes problem similar to your one.

alexivkin commented 5 years ago

It's a normal private GSuite group

icy commented 5 years ago

It's a normal private GSuite group

Please make sure that you have set environment variable _ORG=Your_Gsuite_Domain before you start the script. If you already did that, please check if the group allows archive access

My Gsuite account was expired I don't have any better idea now. As long as you can't have Show original message in your browser, the script can't download anything.

alexivkin commented 5 years ago

For some reason I can't find the content classification in the group settings, even though I am the owner. I figured out a different solution - changing wget from raw to https://groups.google.com/a/....com/forum/print/msg/....

Although I am missing the message headers and other metadata, it's good enough for my needs. Thank you very much for the excellent script that you wrote!

icy commented 5 years ago

Thanks a lot @alexivkin . I'm happy that you can get around the problem. That's really interesting about the printing function and how you found it. Let's leave this issue open in case someone can figure out more details.

Nice weekend.

icy commented 4 years ago

I have updated the README to mention the trick: https://github.com/icy/google-group-crawler#contributions . thanks again