kpegion / SubX

Kathy's Codes for Accessing and Processing SubX Data from the IRI Data Library
MIT License
20 stars 10 forks source link

Python SubX download_data produces an error before scripts can be created #18

Open kdl0013 opened 2 years ago

kdl0013 commented 2 years ago

In file, SubX/Python/download_data/generate_full_py_ens_files.ksh the code will produce a list of all files, but it immediately fails at fen='python tmp.py' and produces the following error:

RuntimeError: NetCDF: Access failure oc_open: server error retrieving url: code=6 message="request too large"

The link provided https://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.RSMAS/.CCSM4/.hindcast/.tas/dods does not appear to have any files to open which may be the problem. IRIDL may have updated the site to not allow dods information to populate through.

raghu330 commented 2 years ago

I too face the same issue "oc_open: server error retrieving url: code=6 message="request too large"". I added 'decode_times=False', but the error still persists.

kdl0013 commented 2 years ago

I too face the same issue "oc_open: server error retrieving url: code=6 message="request too large"". I added 'decode_times=False', but the error still persists.

@raghu330 You can use this python script I created to download SubX data. This script will create a wget list and you can run that script to download the files.

#!/usr/bin/env python3

'''Create wget script for each SubX model to setup download of files in parallel 

Inputs: IRIDL NCAR username and password
Inputs -- you can change the model and variables that you want downloaded

Outputs: Shell script which contains download info. Files will download 20 at a time when run.

'''
import os
import datetime as dt
import numpy as np

username_IRIDL = "usrname"
password_IRIDL = "psswd"

models = ['GMAO']
sources = ['GEOS_V2p1']
#vars = ['huss', 'dswrf','mrso','tas', 'uas', 'vas','tdps','pr','cape']
vars = ['tasmax','tasmin']

# get the dates for all SubX models (this is all the possible dates) I started at year 2000 because other SubX models have all started by then
start_date = dt.date(2000, 1, 5)
end_date = dt.date(2015, 12, 26)

dates = [start_date + dt.timedelta(days=d) for d in range(0, end_date.toordinal() - start_date.toordinal() + 1)]
#GMAO GEOS specifically skips leap days
dates_out = []
for i in dates:
    if i.month == 2 and i.day == 29:
        pass
    else:
        dates_out.append(i)

#Only every 5th date
dates=dates_out[::5]

### GMAO GEOS_V2p1 model

#vars=['tas']

new_dir=(f'/glade/scratch/klesinger/SubX/{models[0]}')
os.system(f'mkdir {new_dir}')

count=0    
output=[]
output.append('#!/bin/bash')
for m_i, model in enumerate(models):

    for d_i, date in enumerate(dates):

        for v_i, var in enumerate(vars):

            date_str = '{}-{}-{}'.format(str(date.year), str(date.month).rjust(2,'0'), str(date.day).rjust(2,'0'))

            command = f"wget -nc --user {username_IRIDL} --password {password_IRIDL} 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.{model}/.{sources[m_i]}/.hindcast/.{var}/S/(1200%20{str(date.day)}%20{date.strftime('%b')}%20{str(date.year)})/VALUES/data.nc' {new_dir}/{var}_{model}_{date_str}.nc &"

            if count % 20 == 0:
                output.append('wait')

            count+=1
            output.append(command)

np.savetxt('wget_GMAO.txt',output, fmt="%s")

os.system("cat wget_GMAO.txt > wget_GMAO.sh")
os.system("rm wget_GMAO.txt")

After running the script, use command line to bash wget_GMAO.sh

raghu330 commented 2 years ago

Hello @kdl0013 Thanks very much for the response and for an excellent wget solution. My apologies for the delay in responding. I was able to help the person who came up with his issue to me using a similar but convoluted approach, but yours is better optimized. Since then, I am yet to hear further from him again, so my involvement has reverted back to my official duties. Thank you very much! Cheers Raghu