Open JimFluke opened 1 day ago
@JimFluke thanks for reporting this issue.
It looks like you are missing lxml
. Can you try pip install lxml
and try again? hopefully it is just a dependency issue.
EDIT:
I recently moved beautifulsoup4
and lxml
to be installed as extra dependencies (and not as required dependencies) to make pydap more lightweight. This may have caused some trouble with authentication. Will investigate and report
Alternatively you can try install the complete server dependencies (as opposed to minimal dependencies) via conda:
conda install pydap-server
Let me know if that works
@Mikejmnez This is what I get when I pip install
the lxml package:
2024-11-07 21:23:36,466 INFO __main__: url: https://gcin01.cira.colostate.edu/thredds/dap4/cloudsat-data/2B-GEOPROF.P1_R05/2013/180/2013180111833_38146_CS_2B-GEOPROF_GRANULE_P1_R05_E06_F00.hdf
/usr/local/lib/python3.11/site-packages/pydap/cas/get_cookies.py:129: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.
soup = BeautifulSoup(resp.content, "lxml")
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 199, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
OSError: [Errno 113] No route to host
I can get to the host with a browser from the same host I'm running the python script on, so I don't know why it's giving this error.
I'll try the conda install pydap-server
method next.
But, it I try the same thing with the dap2 protocol it gives me this:
2024-11-07 22:08:49,782 INFO __main__: url: https://gcin01.cira.colostate.edu/thredds/dodsC/cloudsat-data/2B-GEOPROF.P1_R05/2013/180/2013180111833_38146_CS_2B-GEOPROF_GRANULE_P1_R05_E06_F00.hdf
Traceback (most recent call last):
File "/app/opendap_pydap.py", line 50, in <module>
dataset = open_url(url, session=session, protocol=od_protocol)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pydap/client.py", line 78, in open_url
handler = pydap.handlers.dap.DAPHandler(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pydap/handlers/dap.py", line 98, in __init__
self.make_dataset()
File "/usr/local/lib/python3.11/site-packages/pydap/handlers/dap.py", line 134, in make_dataset
self.dataset_from_dap2()
File "/usr/local/lib/python3.11/site-packages/pydap/handlers/dap.py", line 178, in dataset_from_dap2
raise_for_status(r)
File "/usr/local/lib/python3.11/site-packages/pydap/net.py", line 37, in raise_for_status
raise HTTPError(
webob.exc.HTTPError: 401 Unauthorized
<!doctype html><html lang="en"><head><title>HTTP Status 401 – Unauthorized</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 401 – Unauthorized</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Description</b> The request has not been applied to the target resource because it lacks valid authentication credentials for that resource.</p><hr class="line" /><h3>Apache Tomcat</h3></body></html>
Again, the authentication works through the browser, so I'm still confused.
The semantics of HTTP 401 Unauthorized include that the 401 error is an invitation for the client to resubmit the request with credentials if the client has them. I wonder - if the server that pyDAP is accessing is using a Single Sign-on Service for authentication, then the URL which returns the 401 may not be the same URL as the DAP service:
https://gcin01.cira.colostate.edu/thredds/dap4/cloudsat-data/2B-GEOPROF.P1_R05/2013/180/2013180111833_38146_CS_2B-GEOPROF_GRANULE_P1_R05_E06_F00.hdf
But rather the URL of the authentication service.
I see that pretty frequently as an issue, but I don't know how pyDAP does it.
It might be the auth service URL could/would be passed into this call:
session = setup_session(username, password, check_url=url)
@Mikejmnez ?.
@Mikejmnez When I try this with conda install pydap-server
I get the same results - with both dap2 and dap4 - as with adding lxml to the pip install
. I'll look into the "auth service URL" and see what I find. Thanks!
Thanks @JimFluke that was useful - lxml needs to be included, but overall that does not fix your issue.
Like @ndp-opendap mentioned, we need to look at the auth process and I am not very familiar with this aspect so will need to some to look at and test.
@Mikejmnez @ndp-opendap That worked! I was eventually able to figure out what the check_url should be set to:
https://gcin01.cira.colostate.edu/thredds/restrictedAccess/DPCData
in my case. I got this from looking at the tomcat localhost_access_log.* file for the URL it was accessing when I was logging in with the browser. I was expecting setup_session() to need my digested password since I have the server configured to use those, but it requires my undigested password instead.
Thanks for all your help!
Nice work @JimFluke - It's a lot easier when the SSO is made a more visible part of the recipe. NASA's Earth Data Login requires similar invocation, but NASA makes a big deal about documenting EDL and how to use it.
@JimFluke Great news!
But, it only works with dap2. With dap4 I get the same No route to host
error I got before.
I am trying to use authentication credentials to connect to our TDS. I have tried embedding the credentials into the url, but I get this error:
But I understand this authentication method is from old documentation and will not work. So I have recently tried setting up a connection session:
With this result:
This is an HDF4-EOS file being accessed from a THREDDS server, so the problem described in issue #401 will probably show up but only after the code gets passed this authentication problem.
Thanks!