openclimatedata / national-inventory-submisions

Downloader for UNFCCC National Inventory Submissions
7 stars 3 forks source link

Error with Python 3.5.2 #7

Open JGuetschow opened 4 years ago

JGuetschow commented 4 years ago

When using the current version of the downloader with Python 3.5.2 I get an AttributeError:

./venv/bin/python scripts/process.py 2019 
Fetching submissions for 2019
https://unfccc.int/process-and-meetings/transparency-and-reporting/reporting-and-review-under-the-convention/greenhouse-gas-inventories-annex-i-parties/national-inventory-submissions-2019
Traceback (most recent call last):
  File "scripts/process.py", line 46, in <module>
    links = table.findAll('a')
AttributeError: 'NoneType' object has no attribute 'findAll'
Makefile:11: recipe for target 'data-2019' failed
make: *** [data-2019] Error 1

Python version

Python 3.5.2 (default, Oct  8 2019, 13:06:37) 
[GCC 5.4.0 20160609] on linux
rgieseke commented 4 years ago

It's not related to 3.5, can reproduce on 3.7.

rgieseke commented 4 years ago

Captcha when going to https://unfccc.int/process-and-meetings/transparency-and-reporting/reporting-and-review-under-the-convention/greenhouse-gas-inventories-annex-i-parties/national-inventory-submissions-2019 in browser:

Why am I seeing this page?

The website you are visiting is protected and accelerated by Incapsula. Your computer may have been infected by malware and therefore flagged by the Incapsula network. Incapsula displays this page for you to verify that an actual human is the source of the traffic to this site, and not malicious software.

JGuetschow commented 4 years ago

I don't get a captcha. Maybe a cookie that's set can avoid the captcha?

But it looks like they want to prevent mass downloads.

rgieseke commented 4 years ago

The nightly GitHub Action also seems to have worked fine.

rgieseke commented 4 years ago

https://github.com/openclimatedata/national-inventory-submisions/runs/422408149?check_suite_focus=true

JGuetschow commented 4 years ago

Tried to add cookies (my browser cookies where I get no captcha) to the requests call, but it didn't help. Might have done something wrong though.

rgieseke commented 4 years ago

As a workaround you could try saving the HTML file from the browser and parsing that in the script.

JGuetschow commented 4 years ago

I've now tried

So, I'm currently out of ideas

rgieseke commented 4 years ago

I also tried adding a browser user agent ...

rgieseke commented 4 years ago

Which files are you trying to get?

JGuetschow commented 4 years ago

crf 2019 currently.

The overview page is loaded from a file and then I get the following message Error fetching https://unfccc.int/documents/195779 It's retrying until I manually interrupt