olivierhagolle / LANDSAT-Download

Automated download of LANDSAT data from USGS website
http://olivierhagolle.github.io/LANDSAT-Download
GNU General Public License v3.0
205 stars 96 forks source link

403 Forbidden #23

Closed benboughton1 closed 8 years ago

benboughton1 commented 8 years ago

Is anyone else experiencing 'USGS not currently responding to requests'?

When catching the error it is giving me a 403 Forbidden.

My user name and password are working in Earth Explorer web interface when I try download the exact file using the link LANDSAT-Download generates.

I have logged out of Earth Explorer before using this script as well.

dswanepoel commented 8 years ago

I'm also getting a 403 Forbidden since yesterday when trying to log in via Python and wget.

mkmitchell commented 8 years ago

I'm also experiencing this error.

greenspin commented 8 years ago

Same error here.

dswanepoel commented 8 years ago

This may be caused by the addition of a CSRF token in the ERS login form. I don't recall that being present before.

olivierhagolle commented 8 years ago

Dear all, Thanks for signalling the change in USGS policy. I am on holidays with a poor connexion, I can try that next week, but besides, I am not sure I know how to handle such a token in a python login. Any of you knows ? Best regards, Olivier

mkmitchell commented 8 years ago

Can do. I have a working example for no_proxy so I'll change proxy to what I have but can't test it.

olivierhagolle commented 8 years ago

Great ! Thanks a lot ! Please do a pull request when you are ready, i'll try it as soon as possible, and will try to implement the proxy part, at least with CNES's proxy (which is a hard one) Best regards Olivier

timburgess commented 8 years ago

Looking at the form html, it appears there are two hidden fields, csrf_token and __ncforminfo. I imagine that both of those would have to be supplied on the form POST..

screen shot 2016-08-12 at 9 28 17 am

greenspin commented 8 years ago

Although I'm using JAVA Apache to get the Landsat files downloaded, I was facing the same problem there. Here is how I got it running again. Might be helpful for you as well: The __ncforminfo token is not important, runs even without posting this token. The csrf_token must be read out and submitted again. The important change for me was to send the whole header information again when posting the username and password together with the csrf token. Here is the JAVA code, for completeness:

HttpClientContext context = HttpClientContext.create();
CookieStore cookieStore = new BasicCookieStore();
context.setCookieStore(cookieStore);
CloseableHttpClient client = HttpClientBuilder.create().build();
HttpGet get = new HttpGet("https://ers.cr.usgs.gov/login/");
get.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
HttpResponse response = client.execute(get, context);

Get the information for the csrf token from the response of the GET method.

List<NameValuePair> paramList = new ArrayList<NameValuePair>();
paramList.add(new BasicNameValuePair("username", user));
paramList.add(new BasicNameValuePair("password", pwd));
paramList.add(new BasicNameValuePair("csrf_token", csrf_token));
HttpPost post = new HttpPost("https://ers.cr.usgs.gov/login/");
post.setHeaders(get.getAllHeaders());
UrlEncodedFormEntity urlEncodedFormEntity = new UrlEncodedFormEntity(paramList, "UTF-8");
post.setEntity(urlEncodedFormEntity);
HttpResponse response2 = client.execute(post, context);

This gives me a 302, ready for download the files. Hope this will help. Good luck, Gunther

mkmitchell commented 8 years ago

I did a quick fix in case anyone needs this. I'm sure Olivier will make this much cleaner. I had to pip install BeautifulSoup to parse the html.

If you need this going asap this works for me.

def connect_earthexplorer_no_proxy(usgs):
    cookies = urllib2.HTTPCookieProcessor()
    opener = urllib2.build_opener(cookies)
    urllib2.install_opener(opener)

    soup = BeautifulSoup(urllib2.urlopen("https://ers.cr.usgs.gov/login").read())
    token = soup.find('input', {'name': 'csrf_token'})
    params = urllib.urlencode(dict(username=usgs['account'],password= usgs['passwd'], csrf_token=token['value']))
    request = urllib2.Request("https://ers.cr.usgs.gov/login", params, headers={})
    f = urllib2.urlopen(request)
    data = f.read()
    f.close()
    if data.find('You must sign in as a registered user to download data or place orders for USGS EROS products')>0 :
        print "Authentification failed"
        sys.exit(-1)
    return
olivierhagolle commented 8 years ago

Thanks a lot Mike, It looks much simpler now, and it works !. I did not know this BeautifulSoup library. The only drawback is that we need to install it. Olivier

dswanepoel commented 8 years ago

Here is an alternative using regex (not as robust, but with no external dependency):

import re
...
data = urllib2.urlopen("https://ers.cr.usgs.gov/login").read()
m = re.search(r'<input .*?name="csrf_token".*?value="(.*?)"', data)
if m:
    token = m.group(1)

Another possible alternative that doesn't require external dependencies is https://docs.python.org/2/library/htmlparser.html

mkmitchell commented 8 years ago

Nice using regex! I was going to look into that today.

Opadera commented 8 years ago

@mkmitchell I get TypeError: 'module' object is not callable using your code posted above. any idea how to solve this?

olivierhagolle commented 8 years ago

I am testing the suggestion of dswanepoel, which seems to work well. That will enable to avoid the BeautifulSoup (no Soup in summer ;) )

I will push the new version soon. I still need to test with the proxy version Olivier

olivierhagolle commented 8 years ago

Done.

christophe-06 commented 5 years ago

Dear Mr. Hagolle, I used the Landsat-8 download script (with a list of products in inputs) a few months ago without problem. Today after one or two downloads, I get this error: "CSRF_Token not found". Is it a limitation on USGS side? Thanks for your help. Christophe