pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.53k stars 1.05k forks source link

Can't load xarray from certain URL #8620

Closed chudlerk closed 6 months ago

chudlerk commented 6 months ago

What happened?

Normally, if there is a NetCDF file hosted online somewhere, I can just put the url into the open_dataset function, and it works great. i.e. da = xr.open_dataset('https://www.someurl.com/data/file.nc')

However, If I try to download a file from this website, I get an error.

For example, scrolling down to "1991–2020 Monthly Normals", right clicking on "Precipitation", and copying the link address...

da = xr.open_dataset('https://www.nodc.noaa.gov/archive/arc0196/0245564/1.1/data/0-data/prcp-1991_2020-monthly-normals-v1.0.nc')

Leads to this long error (see below)

If I just download the file to disk by clicking on the link on the page, and then do xr.open_dataset on the path of the downloaded file, it works just fine.

What did you expect to happen?

NetCDF file to be read in as a Dataset when passing the URL to xr.open_dataset, as works with other URLs

Minimal Complete Verifiable Example

import xarray as xr

da = xr.open_dataset('https://www.nodc.noaa.gov/archive/arc0196/0245564/1.1/data/0-data/prcp-1991_2020-monthly-normals-v1.0.nc')

MVCE confirmation

Relevant log output

Traceback (most recent call last):

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\file_manager.py:211 in _acquire_with_cache_info
    file = self._cache[self._key]

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\lru_cache.py:56 in __getitem__
    value = self._cache[key]

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://www.nodc.noaa.gov/archive/arc0196/0245564/1.1/data/0-data/prcp-1991_2020-monthly-normals-v1.0.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), '172372dd-6014-42db-bbd6-c1f17be389be']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  Cell In[14], line 1
    da = xr.open_dataset('https://www.nodc.noaa.gov/archive/arc0196/0245564/1.1/data/0-data/prcp-1991_2020-monthly-normals-v1.0.nc')

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\api.py:570 in open_dataset
    backend_ds = backend.open_dataset(

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\netCDF4_.py:602 in open_dataset
    store = NetCDF4DataStore.open(

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\netCDF4_.py:400 in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\netCDF4_.py:347 in __init__
    self.format = self.ds.data_model

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\netCDF4_.py:409 in ds
    return self._acquire()

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\netCDF4_.py:403 in _acquire
    with self._manager.acquire_context(needs_lock) as root:

  File ~\AppData\Local\miniconda3\lib\contextlib.py:119 in __enter__
    return next(self.gen)

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\file_manager.py:199 in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)

  File ~\AppData\Local\miniconda3\lib\site-packages\xarray\backends\file_manager.py:217 in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)

  File src\netCDF4\_netCDF4.pyx:2353 in netCDF4._netCDF4.Dataset.__init__

  File src\netCDF4\_netCDF4.pyx:1963 in netCDF4._netCDF4._ensure_nc_success

OSError: [Errno -90] NetCDF: file not found: b'https://www.nodc.noaa.gov/archive/arc0196/0245564/1.1/data/0-data/prcp-1991_2020-monthly-normals-v1.0.nc'

syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <!DOCTYPE^ html><html lang="en"><head><!-- Document specific SSI statements --><meta http-equiv="content-type" content="text/html; charset=UTF-8" /><link rel="shortcut icon" href="/Images/favicon.ico" /><title>Error 404: Not Found</title><meta name="keywords" content=", oceanography,ocean,data,archive,marine,coast,temperature,salinity,buoy,ocean climate,world ocean atlas,nitrate,phosphate,silicate,CTD,XBT,ADCP,SST,circulation,currents,sea level,altimetry,chlorophyll,plankton,ocean chemistry,ocean physics,ocean biology,ocean profiles,ocean time series,GTSPP,WOCE,JGOFS,World Data Center,alkalinity,pH,nitrite,dissolved oxygen,satellite,remote sensing,wave height,GODAR,NODC" /><meta name="Description" content="NOAA's National Centers for Environmental Information (NCEI) are responsible for hosting and providing public access to one of the most significant archives for environmental data on Earth with over 20 petabytes of comprehensive oceanic, atmospheric, and geophysical data. /errors/notfound.html" /><meta name="DC.title" content="Error 404: Not Found" /><meta name="DC.description" content="Home page of the National Centers for Environmental Information, containing high quality global physical, chemical, and biological oceanographic data sets" /><link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /><link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" /><meta name="DC.title" lang="en" content="Error 404: Not Found" /><meta name="DC.creator" lang="en" content="US Department of Commerce, NOAA National Centers for Environmental Information" /><meta name="DCTERMS.modified" scheme="W3CDTF" content="2015-07-14" /><meta name="DC.language" scheme="RFC4646" content="en" /><meta name="DC.identifier" scheme="DCTERMS.URI" content="http://www.nodc.noaa.gov/errors/notfound.html" /><link rel="stylesheet" href="/styles/reset.css" /><link rel="stylesheet" href="/styles/960.css" /><link rel="stylesheet" href="/styles/style.css" /><link rel="stylesheet" href="/styles/jshowoff.css" /><!--[if lte IE 8]> <link rel="stylesheet" type="text/css" media="all" href="/styles/iefix.css" /><![endif]--><script id="_fed_an_ua_tag" type="text/javascript" src="/scripts/federated-analytics.js?agency=DOC&subagency=NOAA&pua=UA-42101633-1"></script></head><body class="menu4_on lnhome_on snnone_on"> <div class="bkgoutter"><div class="bkginner parallax" data-speed="4"> <div class="container_16 mainpaper">  <div class="grid_16" id="noaahead" style="margin-bottom:8px"> <!-- NOAA Header --> <a href="http://www.noaa.gov" class="nobg"><img src="/media/images/common/noaalogo2.png" alt="NOAA Logo" /></a> <img src="/media/images/common/nceilogo2.png" alt="National Centers for Environmental Information" />  <a href="http://www.Commerce.gov" class="nobg"><img src="/media/images/common/commercelogo2.png" class="clogo" alt="Department of Commerce Logo" /></a> <!-- End NOAA Header --> </div> <!-- end #noaahead .grid_16 --> <!-- Main Navigation Bar --> <div class="grid_16"><!--    <div class="noaahead-extra"> -->   <!-- Formerly NODC Bar --><!--    <p>formerly the National Oceanographic Data Center (NODC)... &nbsp;<a href="http://www.ncei.noaa.gov/">more on NCEI</a></p> --><!--   </div>   -->    <div id="nodcnav">     <ul id="menu"><li id="home"><a href="/"><span>Home</span></a></li><li id="menu1"><a href="/access/index.html"><span>Access Data</span></a></li><li id="menu2"><a href="/submit/index.html"><span>Submit Data</span></a></li><li id="menu3"><a href="/outreach/index.html"><span>Public Outreach</span></a></li><li id="menu4"><a href="/about/index.html"><span>About</span></a></li></ul>     </div>     </div>  <!-- End Main Navigation Bar -->   <div class="grid_16 topsearch">   <p>NOAA Satellite and Information Service</p>   <div class="searchbox">    <form action="https://search.usa.gov/search" method="get" class="noaainfo">         <label for="affnodc"><input class="marg2" id="affnodc" type="radio" name="affiliate" checked="checked" value="nodc.noaa.gov" />This Site</label>         <label class="marg" for="affnoaa"><input class="marg2" type="radio" id="affnoaa" name="affiliate" value="noaa.gov" />All of NOAA</label>         <input type="hidden" name="v:project" value="firstgov" />         <input class="search" type="text" name="query" size="18" value="Search" onfocus="this.value=''"/>         <input type="image" class="go" title="Go search the NOAA or NCEI Website" src="/media/images/common/go.gif" alt="Go search the NOAA or NCEI Website" border="0" />    </form>   </div> <!-- end .searchbox -->  </div> <!-- end .topsearch .grid_16 -->  <div class="clear"></div><!-- See WD-769 - Moving to NCEI -->  <div class="grid_16">   <div style="padding: 10px; border: 5px solid red;">    <p style="margin-bottom:0;"><strong>NCEI is transitioning to a new website and paths to data resources will be changing. Please contact <a href="mailto:NCEI.Info@noaa.gov">NCEI.Info@noaa.gov</a> with any questions of issues. See the new website at <a href="https://www.ncei.noaa.gov/">www.ncei.noaa.gov</a>.</strong></p>   </div>    </div><div class="grid_16" id="crumbs"> <p><strong>You are here:</strong> <a href="/index.html">Home</a> &rsaquo; Error 404: Not Found</p></div>  <div class="content3 grid_16" id="content">  <div class="main grid_12 omega">  <h2>Error 404: Not Found</h2> <p>We apologize, but the page or file does not exist.</p> <div class="infobox1 shadow grid_8 alpha"> <h3 class="separator2"></h3>  <h3 style="margin-left:10px;">Please try the following:</h3> <ul> <li>Check the URL for spelling / typing errors</li> <li>Review old bookmarks</li> <li>Go <a href="/">Home</a> or <a href="/about/contact.html">Contact Us</a></li> <li><form accept-charset="UTF-8" action="http://search.usa.gov/search" id="search_form" method="get"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="&#x2713;" /></div> <input id="affiliate" name="affiliate" type="hidden" value="nodc.noaa.gov" /> <input autocomplete="off" class="usagov-search-autocomplete" id="query" name="query" type="text" /> <input name="commit" type="submit" value="Search" /> </form></li> </ul> </div> </div> <!-- end .grid_12 --> <div class="leftbar grid_4 alpha"> <div class="leftnav3"> <h3 id="lnhome"><a href="/access/" class="mnav">Error 404</a></h3>  </div> <!-- end .leftnav --> </div>    </div> <!-- end .content --> <div class="grid_16 bottombar">  <p><a href="/access/index.html">Access Data</a> - <a href="/submit/index.html">Submit Data</a> - <a href="/General/datacom_form.html">Intended Use of the Data?</a> - <a href="https://www.ncdc.noaa.gov/nespls/olstore.main?look=1">Online Store</a> - <a href="/about/contact.html">Customer Service</a></p> </div> <div class="clear"></div>   </div> <!-- end .container_16 --></div> <!-- end .bkginner --></div> <!-- end .bkgoutter --><div class="bkgfooter"> <!-- footer background --><div class="container_16"> <!-- 960 footer --> <div class="grid_16 footer">  <div class="prefix_1 grid_7 alpha">   <ul class="footerlist">    <li>Last modified:&nbsp; Tuesday, 14-Jul-2015 13:19:07 UTC</li>    <li><abbr title="Department of Commerce"><a href="http://www.doc.gov/">Dept. of Commerce</a></abbr> - <abbr title="National Oceanic and Atmospheric Administration"><a href="http://www.noaa.gov/">NOAA</a></abbr> - <abbr title="National Environmental, Satellite, Data and Information Service"><a href="http://www.nesdis.noaa.gov/">NESDIS</a></abbr> - <abbr title="National Centers For Environmental Information"><a href="http://www.ncei.noaa.gov/">NCEI</a></abbr></li>    <li><a href="/survey.html">NCEI, Maryland Office, Website Survey</a></li>    <li><img src="/media/images/common/extrnl_link2.gif" alt="External Link" style="float:left; margin:3px 5px 0 0;"/> Offsite Link Notification</li>   </ul>  </div>    <div class="prefix_2 grid_6 omega">   <div class="ficons rfloat">    <a href="https://twitter.com/NOAANCEIocngeo"><img src="/media/images/common/twitter3.gif" alt="Like us on Twitter" width="20" height="20" /></a>    <a href="http://www.facebook.com/NOAANCEI/"><img src="/media/images/common/facebook3.gif" alt="Like us on Facebook" width="20" height="20" /></a>    <a href="/rss/"><img src="/media/images/common/rssfeed-icon2.jpg" alt="RSS feed" width="20" height="20" /></a>   </div>      <ul class="footerlist">    <li><a href="mailto:NCEI.info@noaa.gov">NCEI.info@noaa.gov</a></li>    <li><a href="http://www.facebook.com/NOAANCEI/">Like us on Facebook</a> | <a href="https://twitter.com/NOAANCEIocngeo">Follow us on Twitter</a></li>    <li><a href="http://www.noaa.gov/privacy.html">Privacy Policy</a> - <a href="/about/disclaimer.html">Disclaimer</a> - <a href="http://www.cio.noaa.gov/services_programs/info_quality.html">Information Quality</a></li>    <li><a href="http://www.corporateservices.noaa.gov/%7Efoia/">Freedom of Information Act</a> (FOIA)</li>    <li><abbr title="U.S. Government's Official Web Portal"><a href="http://www.usa.gov/">USA.gov</a></abbr> - The U.S. Government's Web Portal</li>   </ul>  </div> <!-- end .prefix_1 .grid_7 --> </div> <!-- end .footer --> <div class="clear"></div></div> <!-- end 960 footer --></div> <!-- end .bkgfooter --><script type="text/javascript" src="//ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>  <!-- Previous Jquery version was 1.7.2, Revert back if any problems are found --><script text="text/javascript" src="//ajax.googleapis.com/ajax/libs/jqueryui/1.10.0/jquery-ui.min.js"></script><script type="text/javascript">   $(document).ready(function(){   $('.nojs').hide();  $(".stripeme tr:nth-child(odd)").addClass("alt");  //$("div.parallax").css("background-attachment","fixed"); // var $window = $(window); // $('div.parallax').each(function(){ // var $bgobj = $(this); // assigning the object  // // $(window).scroll(function() { // var yPos = -($window.scrollTop() / $bgobj.data('speed'));  //  // // Put together our final background position // var coords = '50% '+ yPos + 'px';  // // // Move the background // $bgobj.css({ backgroundPosition: coords  // }); // });  //}); });</script> </body></html>

Anything else we need to know?

No response

Environment

C:\Users\kchudler\AppData\Local\miniconda3\lib\site-packages\_distutils_hack\__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit: None python: 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:39:05) [MSC v.1929 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 141 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: ('English_United States', '1252') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2023.7.0 pandas: 2.1.4 numpy: 1.26.3 scipy: 1.11.4 netCDF4: 1.6.0 pydap: None h5netcdf: 1.3.0 h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2022.02.1 distributed: 2022.2.1 matplotlib: 3.4.3 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.12.2 cupy: None pint: 0.23 sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.1 conda: 23.1.0 pytest: 7.4.4 mypy: None IPython: 8.17.2 sphinx: 7.2.6
welcome[bot] commented 6 months ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!