mscross / pysplit

A package for HYSPLIT air parcel trajectory analysis.
BSD 3-Clause "New" or "Revised" License
145 stars 80 forks source link

Traceback Error: "OutOfBoundsDatetime: Out of bounds nanosecond timestamp" when calling make_trajectorygroup #37

Closed heatherguy closed 6 years ago

heatherguy commented 6 years ago

Hello Mellissa,

This is my first time using HYSPLIT and pysplit. I am trying to create some trajectory plots from HYSPLIT trajectories that I have run previously using the HYSPLIT Windows GUI. I started by trying to follow your example basic_plotting_example.py.

When I try to run the command: trajgroup = pysplit.make_trajectorygroup(fpath)

It returns the following error: OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4016-08-31 23:00:00

I am not sure where this error comes from or how I can trace it back to its source. It is possible that it could be an error in how I initialized the HYSPLIT trajectory in the first place. For instance the datetime for this trajectory should be 2016-08-31 23:00:00 not 4016. I can't find any information in the HYSPLIT documentation about this.

Have you seen this problem before? Do you have any ideas for how I might fix it in python without having to re-run all my trajectories? I am not sure if this is the right place to post this question but I hope you might point me in the right direction.

I am using python 3.6.

Thank you!

Heather

zhx215 commented 6 years ago

Hi,

I am glad to answer your question... "fpath" should be replaced by the directory of trajectory files (generated by HYSPLIT GUI); for example: trajgroup = pysplit.make_trajectorygroup(r'C:/trajectories/colgate/feb')

here "feb" refers all February trajectories (I use PYSPLIT codes to generate trajectories and this program will rename all files to certain criteria specified). Ideally, you can add other information to "group" your trajectories, such as "1000" to read only trajectories with starting-point altitude of 1000m AGL. Here are the code I use to run the example codes (to plot the trajectories on map):

import pysplit
%matplotlib inline
trajgroup = pysplit.make_trajectorygroup(r'C:/trajectories/colgate/*feb*')
mapcorners =  [-150, 15, -40, 75]
standard_pm = None
bmap_params = pysplit.MapDesign(mapcorners, standard_pm)
bmap = bmap_params.make_basemap()
color_dict = {500.0 : 'blue',
              1000.0 : 'orange',
              1500.0 : 'red'}
for traj in trajgroup:
    altitude0 = traj.data.geometry.apply(lambda p: p.z)[0]
    traj.trajcolor = color_dict[altitude0]
for traj in trajgroup[::4]:
    bmap.plot(*traj.path.xy, c=traj.trajcolor, latlon=True, zorder=20)
#add two lines of codes to plot coast and country lines; the zorder matters here.
bmap.drawcoastlines(linewidth=1.0, color='black', zorder=20)
bmap.drawcountries(linewidth=0.5, color='grey', zorder=20)

I recommend you learn how to generate trajectories using pysplit code interacting with hysplit program. There is an example code available already. I also recommend you to learn some python basics and the use of jupyter notebook... in the code files, there are some explanatory notes that can help you understand the structure of pysplit....

Best, Ryan

heatherguy commented 6 years ago

Hi Ryan,

Thank you for your help. In my code fpath is equal to the file directory as you described, I am sorry that I didn't make that clear. I get the same error even using:

trajgroup = pysplit.make_trajectorygroup(r'c:/hysplit4/working/out/*feb*')

I have checked the actual trajectory output files and they seem fine, all the dates are correct. I can't generate the trajectories using PySplit code because I am generating them using ERA-Interim data that is not currently supported by PySplit as far as I am aware.

Heather

zhx215 commented 6 years ago

Hi Heather,

It never saw this error; so I could not help you on this. ERA-Interim data could also be used to generate trajectory through PYSPLIT. I tried this before. However, there are a lot of steps to do this. I downloaded 3D, 2D, and 1D data in weekly interval from ERA website, merged them using HYSPLIT ERA converting program, and then cropped the files using HYSPLIT “xtrct_time” program. The ERA-Interim data are just like GDAS data and are in weekly interval. PYSPLIT could read ERA data.

Ryan

mscross commented 6 years ago

Hi Heather,

Thanks for raising this issue! What is the naming convention you use for your trajectories, since you didn't generate them with PySPLIT? The PySPLIT naming convention is basename + mon + altitude + season + YYYYMMDDHH or YYMMDDHH, for example 'colgateaug0500summer07080111'.

Background: Recently I added some code to the trajectory file loader to help determine what centuries trajectories belong to, as there is no indication within the trajectory files themselves (the year is only noted as '97' or '13', etc.). This code looks for a substring in the trajectory filename of 10 digits in length (YYYYMMDDHH). If a run of digits is greater than 10 digits in length, it only looks at the first 10. From those 10 digits, it then extracts the first four, interpreting that as the year. If there are no runs of at least 10 digits (ex. YYMMDDHH or no date info), then the default century is 2000, as most people, especially those new to HYSPLIT, generate trajectories for the 21st century.

PySPLIT should be able to properly load any trajectories as long as they don't run afoul of the above convention. Since you note that only the century/millenium is wrong, I think this might be the problem.

Test:

import re

substr = re.findall(r'(\d{10})', 'yourtrajectoryfilename')
year = int(substr[0][:4])

What do you get for substr and year? Is year 4016?

Solution: You've hopefully chosen a consistent naming convention, and if so just use Python or your tool/language of choice to change your trajectory filenames so

I would myself probably do this by the strategic insertion of underscore(s).

It is not necessary to rerun your trajectories to fix this. However, as Ryan has noted, you could in the future use PySPLIT in the future to run your trajectories with ERA-interim data. For trajectory generation, PySPLIT is a very helpful intermediary between HYSPLIT and the user. So as long as meteorology files follow the naming conventions used by ARL-distributed datasets like GDAS and EDAS (and all necessary ARL-packed meteorology is present in the given directory), PySPLIT will happily direct HYSPLIT to produce arbitrarily large numbers of trajectories from ERA-interim or whatever meteorology as desired.

Hope this helps!

Mellissa

heatherguy commented 6 years ago

Hi Mellissa,

Thanks for your advice, it is really clear and useful.

My initial naming convention was as follows: basenameYYMMDDHH

When I run this code: trajgroup = pysplit.make_trajectorygroup(r'.\traj_Puc_Per_Mur_2016070600')

I get the following error: OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4016-07-05 23:00:00

I have tried changing the name of the file, using the following three filenames: trajgroup = pysplit.make_trajectorygroup(r'./traj_Puc_Per_Mur_2016070600') trajgroup = pysplit.make_trajectorygroup(r'./traj_Puc_Per_Mur_16070600') trajgroup = pysplit.make_trajectorygroup(r'./traj_Puc_Per_Mur')

I get the same error each time, even when the results from the test you suggested are correct, i.e: substr Out[29]: ['2016070600']

year Out[30]: 2016

This suggests to me that the perhaps it is not the filename that is the problem, can you think of anything else that might be causing this error?

In the mean time I will try work on running the ERA-interim trajectories through PySplit like you suggested.

Thanks again,

Heather

mscross commented 6 years ago

Huh, weird. I'm sorry that didn't fix the issue, I really thought that was it. Now I need more information:

heatherguy commented 6 years ago

Thanks for your reply Mellissa. It chokes on all the trajectories, they were all generated in exactly the same way using the HYSPLIT GUI. At the bottom of this post I have included the full traceback for the OutOfBoundsError and I have attached one of the offending trajectory files.

I'm going to sneak another question in here in case you can answer it quickly or point me in the right direction: I have since managed to re-run all of the trajectories using PySplit (thanks for pointing out that I could do this with ERA-interim). I can load, handle and plot these trajectories, but none of the background map features show up on my plot. For example when I run the following code:

import pysplit
trajgroup = pysplit.make_trajectorygroup(r'./*feb*)
mapcorners =  [-90, -50, -40, 5]
standard_pm = None
map_params0 = pysplit.MapDesign(mapcorners, standard_pm, drawoutlines=True)
map0 = map_params0.make_basemap()
for traj in trajgroup:
    map0.plot(*traj.path.xy, c='black', latlon=True, zorder=20)

I don't get any errors, but no country outlines show up. In the same way I am not able to plot state lines or latitude/longitude lines. Have you seen this before or do you know where it might be coming from? I thought it might be a matplotlib.basemap issue, but I can plot basemaps just fine without using PySplit?

Thanks again for your help.

Heather

`trajgroup = pysplit.make_trajectorygroup(r'./traj_Puc_Per_Mur_16070600')
Traceback (most recent call last):

  File "<ipython-input-1-380754b3a117>", line 4, in <module>
    trajgroup = pysplit.make_trajectorygroup(r'./traj_Puc_Per_Mur_16070600')

  File "C:\ProgramData\Anaconda3\lib\site-packages\pysplit\hy_processor.py", line 63, in make_trajectorygroup
    data, path, head, datetime, multitraj = load_hysplitfile(hyfile)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pysplit\hyfile_handler.py", line 179, in load_hysplitfile
    century)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pysplit\hyfile_handler.py", line 236, in _trajsplit
    datetime.append(_getdatetime(century, t))

  File "C:\ProgramData\Anaconda3\lib\site-packages\pysplit\hyfile_handler.py", line 375, in _getdatetime
    return pd.DatetimeIndex(times)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 91, in wrapper
    return func(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\datetimes.py", line 335, in __new__
    yearfirst=yearfirst)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 516, in to_datetime
    result = _convert_listlike(arg, box, format)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 447, in _convert_listlike
    raise e

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 435, in _convert_listlike
    require_iso8601=require_iso8601

  File "pandas/_libs/tslib.pyx", line 2355, in pandas._libs.tslib.array_to_datetime (pandas\_libs\tslib.c:46617)

  File "pandas/_libs/tslib.pyx", line 2538, in pandas._libs.tslib.array_to_datetime (pandas\_libs\tslib.c:45511)

  File "pandas/_libs/tslib.pyx", line 2414, in pandas._libs.tslib.array_to_datetime (pandas\_libs\tslib.c:43443)

  File "pandas/_libs/tslib.pyx", line 2409, in pandas._libs.tslib.array_to_datetime (pandas\_libs\tslib.c:43346)

 File "pandas/_libs/tslib.pyx", line 1774, in pandas._libs.tslib._check_dts_bounds (pandas\_libs\tslib.c:32752)

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4016-07-05 23:00:00`

traj_Puc_Per_Mur_16070600.zip

mscross commented 6 years ago

Issue of no map background fixed in #39 !