qir-alliance / qcor

C++ compiler for heterogeneous quantum-classical computing built on Clang and XACC
http://docs.aide-qc.org
MIT License
97 stars 39 forks source link

[Feature] Thread-safe implementation #239

Open 1tnguyen opened 2 years ago

1tnguyen commented 2 years ago

To support multi-threading in QCOR, we need to audit and fix the usage of some static global variables.

This may include:

(1) the qcor runtime (qrt_impl): building up circuit IR for quantum kernels, optimization, submission, etc.

(2) the qpu instance cached in the internal_compiler namespace (which will be used by the qrt for execution)

When pursuing this feature, we could also look into moving the qpu shared pointer as a member of the qrt implementation for future maintainability.

Note: For JIT compilation (using QJIT, which is non-copyable), we need to make the entire class thread-safe (which I think we've done in https://github.com/ORNL-QCI/qcor/pull/157)

ahayashi commented 2 years ago

Hi @1tnguyen, I'm now thinking of how to fix this. Essentially, my idea is to create a std::map of thread ID and the corresponding instance when ::quantum::initialize() is called. The map will be like std::map<std::thread::id, std::shared_ptr<QuantumRuntime>> , and std::map<std::thread::id, std::shared_ptr<Accelerator>>.

For now, I'm assuming that ::quantum::initialize() is manually called at the very beginning of a function that is executed by a std::thread. For example:

void foo() {
    ::quantum::initialize("qpp", "empty");
    ...
}

int main() {
  std::thread t0(foo);
  std::thread t1(foo);
  t0.join();
  t1.join();
}

However, we'll need to discuss how to facilitate this. For example, we could develop a bit higher-level API that tells the runtime where the entry point of each "threaded" part is. Or we could create a compiler pass that automatically inserts such an initialization call.

Going back to my original point. For (1) (qrt_impl), I'm thinking of 1) creating a thread-qrt_impl map after this line (https://github.com/ORNL-QCI/qcor/blob/master/runtime/qrt/qrt.cpp#L154) and preparing a getter function that returns an appropriate qrt_impl that corresponds to a current thread id, and 2) replacing all the occurrence of qrt_impl with the getter function.

For (2), I'll do a similar thing in setAccelerator(), for example, around this line (https://github.com/ORNL-QCI/qcor/blob/master/runtime/qrt/internal_compiler/xacc_internal_compiler.cpp#L121). We'll also have to update the function so it can return a newly cloned instance every time it is called by a newly created thread though.

I also agree with you that we can move the qpu into the qrt_impl, which we could discuss more.

Please let me know if you have questions or suggestions.

Thanks!

Akihiro