icy / google-group-crawler

[Deprecated] Get (almost) original messages from google group archives. Your data is yours.
215 stars 38 forks source link

Loop Detected with Seemingly Valid Cookie #36

Closed RelativePrime closed 4 years ago

RelativePrime commented 4 years ago

Hello,

I'm having an issue similar to issue #24 though not had luck getting the exported, edited, cookie to auth to a private group (no _ORG) and when running tests/test.sh receive a Loop Detected warning.

cookie was exported using chrome cookies.txt extension, has been modified using fix_cookies.sh and appears correct

envs are exported as

export _WGET_OPTIONS="--verbose --load-cookies $HOME/google-group-crawler/tests/fixed_cookies.txt --keep-session-cookies"
export _GROUP='my-group'

tests/test.sh ends with

:: Creating './my-group//threads/t.1' with 'categories/my-group'
:: Fetching data from 'https://groups.google.com/forum/?_escaped_fragment_=categories/my-group'...
--2020-03-13 18:04:49--  https://groups.google.com/forum/?_escaped_fragment_=categories/my-group
Resolving groups.google.com (groups.google.com)... 2607:f8b0:4001:c02::8a, 74.125.70.101, 74.125.70.138, ...
Connecting to groups.google.com (groups.google.com)|2607:f8b0:4001:c02::8a|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://accounts.google.com/ServiceLogin?service=groups2&passive=1209600&osid=1&continue=https://groups.google.com/forum/?_escaped_fragment_%3Dcategories/my-group&followup=https://groups.google.com/forum/?_escaped_fragment_%3Dcategories/my-group&authuser=0 [following]
--2020-03-13 18:04:50--  https://accounts.google.com/ServiceLogin?service=groups2&passive=1209600&osid=1&continue=https://groups.google.com/forum/?_escaped_fragment_%3Dcategories/my-group&followup=https://groups.google.com/forum/?_escaped_fragment_%3Dcategories/my-group&authuser=0
Resolving accounts.google.com (accounts.google.com)... 2607:f8b0:4009:816::200d, 172.217.9.77
Connecting to accounts.google.com (accounts.google.com)|2607:f8b0:4009:816::200d|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

-                              [ <=>                                   ]  71.80K  --.-KB/s    in 0.09s   

2020-03-13 18:28:28 (760 KB/s) - written to stdout [73522]

:: ==================================================
:: Loop detected. Your cookie may not work correctly.
:: You may want to generate new cookie file
:: and/or remove all '#HttpOnly_' strings from it.
:: ==================================================

Any suggestions what I might try?

Cheers,

icy commented 4 years ago

Hi @RelativePrime , I have some sample cookie files here https://github.com/icy/google-group-crawler/tree/master/tests. Can you please have a look if you have the same file format? Thanks

icy commented 4 years ago

The problem should be fixed in the latest version 2.0.0 (using curl). Please have a look if it's better. Thanks.