thinkle / gourmet

Gourmet Recipe Manager
GNU General Public License v2.0
339 stars 138 forks source link

blocked by cloudflare #885

Open imesg opened 7 years ago

imesg commented 7 years ago

Started gourmet in a terminal. This is what was returned after trying to import a webpage. This is the page: http://blog.paleohacks.com/bone-broth-recipe/#

Have looked around for clues but none that I could implement.

gene@go3:~# gourmet ** Message: pygobject_register_sinkfunc is deprecated (GstObject) CONTENT TYPE = text/html; charset=UTF-8 emit ('completed',) emit ('done',) Doing import of http://blog.paleohacks.com/bone-broth-recipe/# <web_import_plugin.generic_web_importer_plugin.GenericWebImporter instance at 0x7f668e9b6320> HERE's the data we got: <!DOCTYPE html>

Access denied | blog.paleohacks.com used Cloudflare to restrict access

Error 1010 Ray ID: 36ef701f24c342e8 • 2017-06-14 18:43:55 UTC

Access denied

What happened?

The owner of this website (blog.paleohacks.com) has banned your access based on your browser's signature (36ef701f24c342e8-ua48).

END DATA

thinkle commented 7 years ago

The website is noticing that it's a bot and not a browser grabbing the page and denying access on that basis (Gourmet doesn't look like a human when it visits webpages).

I have some experimental code to work around this for certain cites (I use it to imports cooksillustrated recipes to which I have a subscription), but the code isn't public yet or really ready for general use.

A quick workaround is to save the webpage as HTML and import the file, which I believe has worked for me in the past.

Tom

On Wed, Jun 14, 2017 at 2:49 PM imesg notifications@github.com wrote:

Started gourmet in a terminal. This is what was returned after trying to import a webpage. This is the page: http://blog.paleohacks.com/bone-broth-recipe/#

Have looked around for clues but none that I could implement.

gene@go3:~# gourmet ** Message: pygobject_register_sinkfunc is deprecated (GstObject) CONTENT TYPE = text/html; charset=UTF-8 emit ('completed',) emit ('done',) Doing import of http://blog.paleohacks.com/bone-broth-recipe/# <web_import_plugin.generic_web_importer_plugin.GenericWebImporter instance at 0x7f668e9b6320> HERE's the data we got:

Access denied | blog.paleohacks.com used Cloudflare to restrict access Please enable cookies. Error 1010 Ray ID: 36ef701f24c342e8 • 2017-06-14 18:43:55 UTC Access denied

What happened?

The owner of this website (blog.paleohacks.com) has banned your access based on your browser's signature (36ef701f24c342e8-ua48).

END DATA — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or mute the thread .
imesg commented 7 years ago

|

Eugene Imes

| From:"Tom Hinkle" notifications@github.com Date:Wed, Jun 14, 2017 at 13:52 Subject:Re: [thinkle/gourmet] blocked by cloudflare (#885)

The website is noticing that it's a bot and not a browser grabbing the page and denying access on that basis (Gourmet doesn't look like a human when it visits webpages).

I have some experimental code to work around this for certain cites (I use it to imports cooksillustrated recipes to which I have a subscription), but the code isn't public yet or really ready for general use.

A quick workaround is to save the webpage as HTML and import the file, which I believe has worked for me in the past.

Tom

Thanks Tom. I will try your suggestion. If I haven't said so gourmet is a great program. Gene

 thread. |

|

imesg commented 7 years ago

Sorry about the double post. Some confusion about editing this post.

imesg commented 7 years ago

Tried the workaround you suggested. Saved page then imported the html page as a file i.e. Select then select saved webpage. Worked like a charm.

Thanks Tom.