radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Pilot data staging commands are not thread safe. #2823

Closed eirrgang closed 1 year ago

eirrgang commented 1 year ago

If Pilot.stage_out() / Pilot.stage_in() commands are issued from multiple threads at the same time, the locking in the PilotManager member functions is insufficient to protect the integrity of its internal state. The functions are not reentrant.

There are opportunities for the thread to yield while executing _pilot_staging_output or _pilot_staging_input and for another thread to replace the _active_sds member dictionary before the thread continues.

This allows the member functions to accidentally complete before the file staging has completed, which is clearly not the intent.

The issue should be easy to resolve by managing a single dictionary instance for the lifetime of the PilotManager (instead of replacing it), using just the uids for which each function invocation is responsible.

The issue was observed in RP 1.20.1 and is still present in the devel branch.