serizba / cppflow

Run TensorFlow models in C++ without installation and without Bazel
https://serizba.github.io/cppflow/
MIT License
781 stars 177 forks source link

How to use specific gpu on multi gpu environment #112

Closed seungtaek94 closed 3 years ago

seungtaek94 commented 3 years ago

Hi.

Is there way to use specific GPU like below?

tf.debugging.set_log_device_placement(True)

try:
  with tf.device('/device:GPU:2'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
except RuntimeError as e:
  print(e)
serizba commented 3 years ago

Hi!

I've never tried to do this with the C API. But I think you can try to play around with some functions of the C API, specially in eager/c_api.h, like these:


TFE_TensorHandle* TFE_TensorHandleCopyToDevice(TFE_TensorHandle* h, TFE_Context* ctx, const char* device_name, TF_Status* status);

void TFE_OpSetDevice(TFE_Op* op, const char* device_name, TF_Status* status)

Unfortunately, these mappings are not implemented in cppflow, at least for now.

x3lif commented 3 years ago

this has been described in this answer on SO,

https://stackoverflow.com/questions/62393258/tensorflow-c-api-selecting-gpu

Maybe calling the method TF_SetConfig with your parameters before any CPPFlow call will do the trick.

seungtaek94 commented 3 years ago

been described in this answer on SO,

Thanks for comments.

I tried that. but it has problem.

I generated serialized proto config like below;

import tensorflow.compat.v1 as tf

def create_serialized_options(fraction, growth):
    config = tf.ConfigProto()
    config.gpu_options.visible_device_list = '1'
    config.gpu_options.per_process_gpu_memory_fraction = fraction
    config.gpu_options.allow_growth = growth
    serialized = config.SerializeToString()
    return '{' + ','.join(list(map(hex, serialized))) + '}'

if __name__ == "__main__":
    print("Create serialized options which allow TF to use a certain percentage of GPU memory and allow TF to expand this memory if required.")
    memory_fraction_interval = 0.05
    for i in range(1, int(1/memory_fraction_interval)):
        memory_fraction_to_use = memory_fraction_interval * i
        enable_memory_growth = False
        print(f'GPU memory to be used: {memory_fraction_to_use*100: .1f} {create_serialized_options(memory_fraction_to_use, enable_memory_growth)}')

If i set visible_device_list config.gpu_options.visible_device_list = '1', even my system has two GPUs it cause memory access problem in cppflow2.

Tensorflow logs:

2021-04-22 13:42:07.327499: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-22 13:42:14.687365: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-22 13:42:14.695013: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-04-22 13:42:14.759052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2021-04-22 13:42:14.759284: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-22 13:42:14.768929: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-22 13:42:14.769096: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-22 13:42:14.774011: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-04-22 13:42:14.776069: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-04-22 13:42:14.785461: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-04-22 13:42:14.789710: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-04-22 13:42:14.791590: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-22 13:42:14.791802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 1
2021-04-22 13:42:15.280052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-22 13:42:15.280204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      1
2021-04-22 13:42:15.280538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 1:   N
2021-04-22 13:42:15.280789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6144 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 SUPER, pci bus id: 0000:02:00.0, compute capability: 7.5)
2021-04-22 13:42:15.282590: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-22 13:42:15.433760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2021-04-22 13:42:15.433934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties:
pciBusID: 0000:02:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2021-04-22 13:42:15.434327: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-22 13:42:15.434417: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-22 13:42:15.435033: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-22 13:42:15.435557: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-04-22 13:42:15.436041: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-04-22 13:42:15.436545: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-04-22 13:42:15.437156: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-04-22 13:42:15.437761: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-22 13:42:15.438237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0, 1
2021-04-22 13:42:15.846902: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-22 13:42:15.847197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 1
2021-04-22 13:42:15.848103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N N
2021-04-22 13:42:15.849008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 1:   N N

MSVS:

Exception error(0x00007FF9324E4B59, tf_c.exe): Microsoft C++ Exception : std::runtime_error, memory location 0x000000D4580FED40.

It throws from cppflow::status_check.

x3lif commented 3 years ago

i'm till using CPPFlow 1, but i'm pretty sure we could make it work by doing the following updates.

In Model.cpp when just after the session is configured (with default parameters) in line 50.

at the following 2 lines : (my config means : select GPU 1, allow gpu_growth=True, per_process_gpu_memory_fractio=0.2) const uint8_t config[16] = { 0x32, 0xe, 0x9, 0x9a, 0x99, 0x99, 0x99, 0x99, 0x99, 0xc9, 0x3f, 0x20, 0x1, 0x2a, 0x1, 0x31 }; TF_SetConfig(sess_opts, config, sizeof(config), this->status);

By doing this i successfully achieved to make tensorflow select my less powerfull GPU Card.

seungtaek94 commented 3 years ago

Actually, TF Saved_Model can't handle this problem.

So, I changed few things by following the steps below.

First, I changed my model type save_model to frozen_graph.pb;

import tensorflow as tf
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2

if __name__ == "__main__":
    saved_model_path = './PATH/FOR/SAVED_MODEL'
    model = tf.keras.models.load_model(saved_model_path)

    # Convert keras model to ConcreteFunction
    full_model = tf.function(lambda x: model(x))
    full_model = full_model.get_concrete_function(
    tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype))

    # Get frozen ConcreteFunction
    frozen_func = convert_variables_to_constants_v2(full_model)
    frozen_func.graph.as_graph_def()

    layers = [op.name for op in frozen_func.graph.get_operations()]
    print("-" * 50)
    print("Frozen model layers: ")
    for layer in layers:
        print(layer)

    print("-" * 50)
    print("Frozen model inputs: ")
    print(frozen_func.inputs)
    print("Frozen model outputs: ")
    print(frozen_func.outputs)

    # Save frozen graph from frozen ConcreteFunction to hard drive
    tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
                      logdir="./frozen_models",
                      name="frozen_graph.pb",
                      as_text=False)

Second, Overlaod the cppflow::model;

inline model::model(const std::string& filename, int indexGPU) {
    this->graph = { TF_NewGraph(), TF_DeleteGraph };

    std::unique_ptr<TF_SessionOptions, decltype(&TF_DeleteSessionOptions)> session_options = { TF_NewSessionOptions(), TF_DeleteSessionOptions };
    std::unique_ptr<TF_Buffer, decltype(&TF_DeleteBuffer)> graph_def = { read_buffer_from_file(filename.c_str()), TF_DeleteBuffer };
    std::unique_ptr<TF_ImportGraphDefOptions, decltype(&TF_DeleteImportGraphDefOptions)> opts = { TF_NewImportGraphDefOptions(), TF_DeleteImportGraphDefOptions };

    std::string device = "/device:GPU:" + std::to_string(indexGPU);         

    TF_ImportGraphDefOptionsSetDefaultDevice(opts.get(), device.c_str());        
    TF_GraphImportGraphDef(this->graph.get(), graph_def.get(), opts.get(), context::get_status());

    auto session_deleter = [](TF_Session* sess) {
        TF_DeleteSession(sess, context::get_status());
        status_check(context::get_status());
    };

    this->session = { TF_NewSession(this->graph.get(), session_options.get(), context::get_status()),  session_deleter };

    status_check(context::get_status());
}

inline void deallocate_buffer(void* data, size_t) {
    std::free(data);
}

inline TF_Buffer* model::read_buffer_from_file(const char* file) {
    FILE* f;
    fopen_s(&f, file, "rb");

    if (f == nullptr) {
        return nullptr;
    }

    std::fseek(f, 0, SEEK_END);
    const auto fsize = ftell(f);
    std::fseek(f, 0, SEEK_SET);

    if (fsize < 1) {
        std::fclose(f);
        return nullptr;
    }

    const auto data = std::malloc(fsize);
    std::fread(data, fsize, 1, f);
    std::fclose(f);

    TF_Buffer* buf = TF_NewBuffer();
    buf->data = data;
    buf->length = fsize;
    buf->data_deallocator = deallocate_buffer;

    return buf;
}

Third, Changed input and output name.

auto tensorOutput = m_model->operator()({ {"x:0", tensorInput} }, { "Identity:0" })[0];

** If i set indexGPU to 1, still some ops are running on /device:GPU:0.