heal-research / HeuristicLab

HeuristicLab - An environment for heuristic and evolutionary optimization
https://dev.heuristiclab.com
GNU General Public License v3.0
38 stars 16 forks source link

Hive worker cannot resume jobs which use native dlls after pausing (and after automatic snapshots every 18 hours) #3166

Open gkronber opened 2 years ago

gkronber commented 2 years ago

Describe the bug

Hive worker create AppDomains for each job and store the assemblies for the job in a folder Temp/PluginTemp/{jobGuid}. When the job is stopped (e.g. for a snapshot), the AppDomain is disposed and the folder with the assemblies is cleared. However, this does not work for native dlls because they cannot be unloaded and the Hive worker process still blocks the native dll. An exception is raised when trying to delete the dll and the folder which is caught by the Hive worker.

The problem arises when the same job is resumed at the same worker. After downloading the job from the server the worker tries to create the folder for the job and write the assemblies. Since this folder and the file still exists another exception is raised (caught again by the Hive worker). However, the job cannot be resumed and will be marked as failed at the Hive server.

To Reproduce Steps to reproduce the behavior:

  1. Create a GP SymReg job and set Evaluator to "Parameter Optimization Evaluator" (this uses the hl-native-interpreter plugin)
  2. Configure GP run to make sure it takes a few minutes (10)
  3. Run in Hive but select only a single worker
  4. Open job manager, wait for the job to be "running" and then pause the job.
  5. Wait for the job to be paused and resume the job
  6. The job will be stopped with state "Failed". The error message will show a problem with "hl-native-interpreter.dll"

Proposed fix Check whether the folder for the jobGuid already exists in the Hive worker and reused the existing folder. Additionally check whether plugin files already exist in the folder and do not overwrite those files. Since it is the same job we can reuse the old files.