Closed hmjbarbosa closed 3 years ago
You're completely right, looking through sorted_indices
for the first occurrence of each traj is nonsense; we need to look through the traj numbers in the first col of sorted_hydata
. Want to make a PR?
I just submitted the pull request. I tried to link it to this issue (for automatic closing it).
There seems to be a bug in the pysplit code intended for reading a hysplit file with multiple trajectories in it. I was able to trace it back and fix the issue, and I'll try to explain it below.
The function load_hysplitfile() (inside file hyfile_handler.py) decides if it is reading a file with a single or multiple trajectories in it:
Function _trajsplit() will then sort the hysplit lines, so that all lines corresponding to the same trajectory are grouped:
It then tries to identify (in the sorted arrays) where is the start position (i.e. first line number) for each trajectory.
The problem happens at this point.
The code takes each trajectories number (in my case, trajectories 1 to 8) and search for all the positions (in the sorted arrays) where that trajectory occurs. The problem is the use of sorted_indices, which hold the positions in the sorted array. If we print the values in sorted_indices[:] we will have numbers 0 to the number of lines in the hysplit file (in my case, 124).
The correct would be to use sorted_hydata[:,0] which is the first column (i.e. ID of trajectory) in the sorted arrays. If we print the values in sorted_hydata[:,0], they will be those in unique_traj (in my case, 1 to 8). The corrected code looks like this:
Could any of the developers have a look and confirm this?