Open RuinCakeLie opened 8 years ago
Interesting. I will take a look. Thanks for your reporting.
Google yields empty
contents when escaped_fragement
is specified, e.g.
https://groups.google.com/forum/?_escaped_fragment_=forum/3dprintertipstricksreviews
This is against (?) the standard. We need a different way to receive data from Google. This is a real challenge!
Google hides most email headers from the raw message. A raw message isn't actually raw ;)
See also https://groups.google.com/forum/message/raw?msg=3dprintertipstricksreviews/LDFZVHeC8Uk/2D1YhGqGDQAJ
Date: Sun, 20 Mar 2016 06:28:20 -0700 (PDT)
From: Rich Webb <ml...@rawebb.net>
To:
"3D Printer Tips, Tricks and Reviews" <3dprintertips...@googlegroups.com>
Message-Id: <d7e58e48-c160-436e-8bdf-10d86a0dc170@googlegroups.com>
Subject: Direction-dependent extrusion volume / track width?
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_4198_351838098.1458480500604"
------=_Part_4198_351838098.1458480500604
Content-Type: multipart/alternative;
boundary="----=_Part_4199_1172380407.1458480500604"
------=_Part_4199_1172380407.1458480500604
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
It's impossible to use traditional method to fetch data from this group. We need to use some higher level tool like phantomjs
.
Well, after days of trying scrolling
method, I've finally found a way to automate the process. There are two other challenges, but they're definitely solvable.
Stay tuned!
I have some initial works on this issue, but (1) it's slow (2) it's undetermined. Maybe I am not good at selenium
.
I'm expecting there's someone can help. I can raise a small fund to support you.
Thanks a lot
I'm trying to scrape a group (https://groups.google.com/forum/#!topic/3dprintertipstricksreviews/) with the adult content flag turned on. Unfortunately, even using cookies all the
escaped_fragment
requests only return: