ihesp / IPART

Image-Process based Atmospheric River Tracking (IPART) algorithms
https://ipart.readthedocs.io/en/latest/
GNU General Public License v3.0
22 stars 8 forks source link

Generalise scripts so any user can run them #16

Open sadielbartholomew opened 3 years ago

sadielbartholomew commented 3 years ago

The scripts under scripts/ rely on hard-coded filesystem paths, e.g. from compute_ivt.py:

#--------------Globals------------------------------------------
#-----------uflux----------------------
UFLUX_FILE='/home/guangzhi/datasets/erai_qflux/uflux_m1-60_6_2007_cln-cea-proj.nc'
UFLUX_VARID='uflux'

#-----------vflux----------------------
VFLUX_FILE='/home/guangzhi/datasets/erai_qflux/vflux_m1-60_6_2007_cln-cea-proj.nc'
VFLUX_VARID='vflux'

OUTPUTFILE='/home/guangzhi/datasets/quicksave2/THR/ivt_m1-60_6_2007_crop2.nc';

and therefore when any user other than yourself tries to run them they run into obvious errors relating to those files not being found, for example I observe:

$ python compute_ivt.py 
Traceback (most recent call last):
  File "compute_ivt.py", line 37, in <module>
    ufluxNV=funcs.readNC(UFLUX_FILE, UFLUX_VARID)
  File "/home/sadie/IPART/ipart/utils/funcs.py", line 775, in readNC
    fin=Dataset(abpath_in, 'r')
  File "netCDF4/_netCDF4.pyx", line 2358, in netCDF4._netCDF4.Dataset.__init__
  File "netCDF4/_netCDF4.pyx", line 1926, in netCDF4._netCDF4._ensure_nc_success
FileNotFoundError: [Errno 2] No such file or directory: b'/home/guangzhi/datasets/erai_qflux/uflux_m1-60_6_2007_cln-cea-proj.nc'
$ python detect_ARs.py
Traceback (most recent call last):
  File "detect_ARs.py", line 187, in <module>
    quNV=funcs.readNC(UQ_FILE_NAME, UQ_VAR)
  File "/home/sadie/IPART/ipart/utils/funcs.py", line 775, in readNC
    fin=Dataset(abpath_in, 'r')
  File "netCDF4/_netCDF4.pyx", line 2358, in netCDF4._netCDF4.Dataset.__init__
  File "netCDF4/_netCDF4.pyx", line 1926, in netCDF4._netCDF4._ensure_nc_success
FileNotFoundError: [Errno 2] No such file or directory: b'/home/guangzhi/datasets/erai_qflux/uflux_m1-60_6_2007_cln-cea-proj.nc'

I appreciate your intention was to include these as templates, but I think in this form, where they rely on resources not included in the repo and reference your personal paths and hence error immediately, they are not very useful to users. This is in contrast to the notebooks which are very useful as there is no user-specific stipulation and all of the datasets are provided in the repo, so everything should work for anyone, as they did when I tested them, assuming they have installed IPART and dependencies and have the right general environment.

To allow users to make good use of the scripts, I suggest adding new datasets to the repo that can be pointed to via those variables such as UFLUX_FILE that users can use to run the scripts on, with some brief guidance stating what users should change to point to their own datasets, etc., instead, to explore the scripts and capability.

(As noted when reviewing towards openjournals/joss-reviews#2407.)

Xunius commented 3 years ago

It is not practical to add data to the repo for this type of task. The data people use for detecting such things came in GBs for real world applications. The notebook is just a toy example. So, no, I'm not adding new datasets, and I think the scripts folder is optional to begin with, when installing via conda one doesn't even get the scripts.