icy / google-group-crawler

[Deprecated] Get (almost) original messages from google group archives. Your data is yours.
215 stars 38 forks source link

HTTP Error 413 #34

Closed want-to-export-group closed 4 years ago

want-to-export-group commented 4 years ago

I am trying to archive messages from a large private group. The script seems to run fine, until the "Fetching data" step. Here is the output (the name has been changed to "group"):

:: Downloading all topics (thread) pages... :: Creating './group//threads/t.0' with 'categories/group' :: Fetching data from 'https://groups.google.com/forum/?_escaped_fragment_=categories/group'... --2019-12-20 13:16:16-- https://groups.google.com/forum/?_escaped_fragment_=categories/group Resolving groups.google.com (groups.google.com)... 2607:f8b0:400d:c0f::8a, 172.217.197.102, 172.217.197.113, ... Connecting to groups.google.com (groups.google.com)|2607:f8b0:400d:c0f::8a|:443... connected. HTTP request sent, awaiting response... 413 Request Entity Too Large 2019-12-20 13:16:16 ERROR 413: Request Entity Too Large.

As you can see, there is an Error 413. What is causing this, and how can it be fixed?

want-to-export-group commented 4 years ago

The test script works fine for "google-group-crawler-public" but fails for "google-group-crawler-public2" due to HTTP error 500. Could something be going wrong with the cookies?

icy commented 4 years ago

@want-to-export-group Was you able to resolve the issue?

icy commented 4 years ago

I haven't seen that issue. Maybe it's a temporary network issue, you can look at the wget command and retry if that helps.

icy commented 4 years ago

The test script works fine for "google-group-crawler-public" but fails for "google-group-crawler-public2" due to HTTP error 500. Could something be going wrong with the cookies?

Yes I can confirm this issue. Google has changed something to prevent our script from working :(

want-to-export-group commented 4 years ago

No I was not able to resolve it

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Thursday, April 9, 2020 1:03 PM, Ky-Anh Huynh notifications@github.com wrote:

@want-to-export-group Was you able to resolve the issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

icy commented 4 years ago

:( it's used to work. Now accessing from the web browser also generates an error https://groups.google.com/forum/?_escaped_fragment_=categories/google-group-crawler-public2

icy commented 4 years ago

By mistake google-group-crawler-public2 was set to private mode. Now it's fine. Btw, I have rewritten the script using curl hopefully it can help to resolve a few strange issue. Stay tuned.

icy commented 4 years ago

The problem should be fixed in the latest version 2.0.0 (using curl). Please have a look if it's better. Thanks.