rich-iannone / splitr

Use the HYSPLIT model from inside R and do more with it
Other
141 stars 60 forks source link

Issues with processing large set of coordinates with foreach() #60

Open vrathi0 opened 3 years ago

vrathi0 commented 3 years ago

Hi Rich,

Thanks so much for such a useful tool.

I am having some problem in using the package with foreach()

Problem: Model run yields incomplete data, that is, a lot of data is being dropped during the model run time. I am mainly using the function hysplit_trajectory()

Possible Cause: I looked into the source code and it seems like the there are lot of files being copied/moved from one location to other and that might cause parallel processes to trip as one worker might be deleting the file even before other had a chance to read it.

For eg, in line 303 of hysplit_trajectory.R

unlink(file.path(exec_dir, trajectory_files), force = TRUE)

The above line is deleting some files and it is outside the "clean_up" toggle loop so the user have no way to control the delete. (ln 321-324 below)

 if (clean_up) {
    unlink(file.path(exec_dir, traj_output_files()), force = TRUE)
    unlink(recep_file_path_stack, recursive = TRUE, force = TRUE)
  }

Also here, can you just make the file in one location instead of moving them during runtime?

# Move files into the output folder
    file.copy(
      from = file.path(exec_dir, trajectory_files),
      to = recep_file_path,
      copy.mode = TRUE
    )

Can you comment on this. I might have a wrong diagnosis but the problem remains: using the hysplit_trajectory() with foreach() parallel loops causes worker processes to drop a lot of data, resulting in incomplete output.

Thanks so much,

juliombarros commented 3 years ago

Same issue here. Did you find a way out so far?

juliombarros commented 3 years ago

By the way, instead of using foreach we are using future_pmap and having the same issues: a lot of data is dropped and the output is incomplete