tensorflow / lattice

Lattice methods in TensorFlow
Apache License 2.0
518 stars 94 forks source link

How to use multi CPU easily? #56

Closed fuyanlu1989 closed 3 years ago

fuyanlu1989 commented 4 years ago

It is so great to see such a good package. However the speed is too slow.

I am using Crystal ensemble model config. tfl.estimators.CannedRegressor estimator. It seems only one CPU is using, though I have 48 CPUs on the machine.

I have set the dataset with multiple threads:

feature_analysis_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(
    x=train_xs.loc[feature_analysis_index].copy(), 
    y=train_ys.loc[feature_analysis_index].copy(), 
    batch_size=128, 
    num_epochs=1, 
    shuffle=True, 
    queue_capacity=1000,
    num_threads=40)

prefitting_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(
    x=train_xs.loc[prefitting_index].copy(), 
    y=train_ys.loc[prefitting_index].copy(), 
    batch_size=128, 
    num_epochs=1, 
    shuffle=True, 
    queue_capacity=1000,
    num_threads=40)

train_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(
    x=train_xs.loc[train_index].copy(), 
    y=train_ys.loc[train_index].copy(), 
    batch_size=128, 
    num_epochs=100, 
    shuffle=True, 
    queue_capacity=1000,
    num_threads=40)

The usage of CPU is still only 1.25 CPU. Any suggestion?

fuyanlu1989 commented 4 years ago

@mmilanifard Any suggestion?

mmilanifard commented 4 years ago

Training a crystals model has multiple steps:

My suggestions:

vekeyli commented 4 years ago

If the Bai used in a single task is multi thread or multi process program, the utilization ratio of multiple cores can be balanced, so as to improve the CPU utilization and reduce the time required for data processing

The main reason is that in the past the CPU was single core, and programmers basically did not have the concept of multi-core programming. Many software has not fully applied the idea of multi-core

Some parts of a program can be easily divided into multiple threads, for example, the interface and data processing of application software are divided into two independent threads. However, some processes are difficult to be simply divided into multiple threads. There are many problems that programmers on PC have not faced before (a few programmers who are proficient in network distributed computing, such as Google's background data system, have encountered these problems, Because these programmers are faced with organizing tens of thousands or even hundreds of thousands of servers on the network for Collaborative Computing), especially data processing programs like what you call