0% gpu utilization but 100% cpu utilization

gnillling commented 6 months ago

I use this command to run docker docker run --rm --gpus '"device=1"' --init --ulimit core=0 -p 8070:8070 -p 8081:8071 grobid/grobid:0.8.0 and then the GPU RAM consumed 20+ GB. But I found that and when i use api/processFulltextDocument to process a pdf file, the GPU utilization is still 0%, but the CPU utilization is up to 100%. Why is it still processed on the CPU?

lfoppiano commented 6 months ago

@gnillling could you please share some of the logs?

gnillling commented 6 months ago

@gnillling could you please share some of the logs?

This is the output after executing a command in the terminal.

docker run --rm --gpus '"device=0"' --init --ulimit core=0 -p 8070:8070 -p 8081:8071 grobid/grobid:0.8.0
WARN  [2024-05-20 05:55:23,002] org.hibernate.validator.internal.properties.javabean.JavaBeanExecutable: HV000254: Missing parameter metadata for ResponseMeteredLevel(String, int), which declares implicit or synthetic parameters. Automatic resolution of generic type information for method parameters may yield incorrect results if multiple parameters have the same erasure. To solve this, compile your code with the '-parameters' flag.
2024-05-20 05:55:24.427347: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
BidLSTM_CRF_FEATURES
2024-05-20 05:55:28.268526: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-20 05:55:28.826217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20981 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:3b:00.0, compute capability: 8.6
load weights from /opt/grobid/grobid-home/models/affiliation-address-BidLSTM_CRF_FEATURES/model_weights.hdf5
loading model weights /opt/grobid/grobid-home/models/affiliation-address-BidLSTM_CRF_FEATURES/model_weights.hdf5
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 features_input (InputLayer)    [(None, None, 10)]   0           []                               

 char_input (InputLayer)        [(None, None, 30)]   0           []                               

 features_embedding_td (TimeDis  (None, None, 10, 4)  484        ['features_input[0][0]']         
 tributed)                                                                                        

 time_distributed (TimeDistribu  (None, None, 30, 25  3800       ['char_input[0][0]']             
 ted)                           )                                                                 

 features_embedding_td_2 (TimeD  (None, None, 8)     288         ['features_embedding_td[0][0]']  
 istributed)                                                                                      

 word_input (InputLayer)        [(None, None, 300)]  0           []                               

 time_distributed_1 (TimeDistri  (None, None, 50)    10200       ['time_distributed[0][0]']       
 buted)                                                                                           

 dropout (Dropout)              (None, None, 8)      0           ['features_embedding_td_2[0][0]']

 concatenate (Concatenate)      (None, None, 358)    0           ['word_input[0][0]',             
                                                                  'time_distributed_1[0][0]',     
                                                                  'dropout[0][0]']                

 dropout_1 (Dropout)            (None, None, 358)    0           ['concatenate[0][0]']            

 bidirectional_2 (Bidirectional  (None, None, 200)   367200      ['dropout_1[0][0]']              
 )                                                                                                

 dropout_2 (Dropout)            (None, None, 200)    0           ['bidirectional_2[0][0]']        

 length_input (InputLayer)      [(None, 1)]          0           []                               

 dense (Dense)                  (None, None, 100)    20100       ['dropout_2[0][0]']              

==================================================================================================
Total params: 402,072
Trainable params: 402,072
Non-trainable params: 0
__________________________________________________________________________________________________
Model: "crf_model_wrapper_default"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 crf (CRF)                   multiple                  2750      

 model (Functional)          (None, None, 100)         402072    

=================================================================
Total params: 404,822
Trainable params: 404,822
Non-trainable params: 0
_________________________________________________________________
[Wapiti] Loading model: "/opt/grobid/grobid-home/models/name/header/model.wapiti"
Model path: /opt/grobid/grobid-home/models/name/header/model.wapiti
[Wapiti] Loading model: "/opt/grobid/grobid-home/models/name/citation/model.wapiti"
Model path: /opt/grobid/grobid-home/models/name/citation/model.wapiti
BidLSTM_ChainCRF_FEATURES
load weights from /opt/grobid/grobid-home/models/header-BidLSTM_ChainCRF_FEATURES/model_weights.hdf5
loading model weights /opt/grobid/grobid-home/models/header-BidLSTM_ChainCRF_FEATURES/model_weights.hdf5
Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 features_input (InputLayer)    [(None, None, 22)]   0           []                               

 char_input (InputLayer)        [(None, None, 30)]   0           []                               

 features_embedding_td (TimeDis  (None, None, 22, 4)  1060       ['features_input[0][0]']         
 tributed)                                                                                        

 time_distributed_2 (TimeDistri  (None, None, 30, 25  8475       ['char_input[0][0]']             
 buted)                         )                                                                 

 features_embedding_td_2 (TimeD  (None, None, 8)     288         ['features_embedding_td[0][0]']  
 istributed)                                                                                      

 word_input (InputLayer)        [(None, None, 300)]  0           []                               

 time_distributed_3 (TimeDistri  (None, None, 50)    10200       ['time_distributed_2[0][0]']     
 buted)                                                                                           

 dropout_3 (Dropout)            (None, None, 8)      0           ['features_embedding_td_2[0][0]']

 concatenate_1 (Concatenate)    (None, None, 358)    0           ['word_input[0][0]',             
                                                                  'time_distributed_3[0][0]',     
                                                                  'dropout_3[0][0]']              

 dropout_4 (Dropout)            (None, None, 358)    0           ['concatenate_1[0][0]']          

 bidirectional_5 (Bidirectional  (None, None, 200)   367200      ['dropout_4[0][0]']              
 )                                                                                                

 dropout_5 (Dropout)            (None, None, 200)    0           ['bidirectional_5[0][0]']        

 dense_2 (Dense)                (None, None, 100)    20100       ['dropout_5[0][0]']              

 dense_3 (Dense)                (None, None, 40)     4040        ['dense_2[0][0]']                

 length_input (InputLayer)      [(None, 1)]          0           []                               

 chain_crf (ChainCRF)           (None, None, 40)     1680        ['dense_3[0][0]']                

==================================================================================================
Total params: 413,043
Trainable params: 413,043
Non-trainable params: 0
__________________________________________________________________________________________________
[Wapiti] Loading model: "/opt/grobid/grobid-home/models/date/model.wapiti"
Model path: /opt/grobid/grobid-home/models/date/model.wapiti
BidLSTM_CRF_FEATURES
load weights from /opt/grobid/grobid-home/models/citation-BidLSTM_CRF_FEATURES/model_weights.hdf5
loading model weights /opt/grobid/grobid-home/models/citation-BidLSTM_CRF_FEATURES/model_weights.hdf5
Model: "model_2"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 features_input (InputLayer)    [(None, None, 19)]   0           []                               

 char_input (InputLayer)        [(None, None, 30)]   0           []                               

 features_embedding_td (TimeDis  (None, None, 19, 4)  916        ['features_input[0][0]']         
 tributed)                                                                                        

 time_distributed_4 (TimeDistri  (None, None, 30, 25  48475      ['char_input[0][0]']             
 buted)                         )                                                                 

 features_embedding_td_2 (TimeD  (None, None, 8)     288         ['features_embedding_td[0][0]']  
 istributed)                                                                                      

 word_input (InputLayer)        [(None, None, 300)]  0           []                               

 time_distributed_5 (TimeDistri  (None, None, 50)    10200       ['time_distributed_4[0][0]']     
 buted)                                                                                           

 dropout_6 (Dropout)            (None, None, 8)      0           ['features_embedding_td_2[0][0]']

 concatenate_2 (Concatenate)    (None, None, 358)    0           ['word_input[0][0]',             
                                                                  'time_distributed_5[0][0]',     
                                                                  'dropout_6[0][0]']              

 dropout_7 (Dropout)            (None, None, 358)    0           ['concatenate_2[0][0]']          

 bidirectional_8 (Bidirectional  (None, None, 200)   367200      ['dropout_7[0][0]']              
 )                                                                                                

 dropout_8 (Dropout)            (None, None, 200)    0           ['bidirectional_8[0][0]']        

 length_input (InputLayer)      [(None, 1)]          0           []                               

 dense_4 (Dense)                (None, None, 100)    20100       ['dropout_8[0][0]']              

==================================================================================================
Total params: 447,179
Trainable params: 447,179
Non-trainable params: 0
__________________________________________________________________________________________________
Model: "crf_model_wrapper_default_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 crf_1 (CRF)                 multiple                  5358      

 model_2 (Functional)        (None, None, 100)         447179    

=================================================================
Total params: 452,537
Trainable params: 452,537
Non-trainable params: 0
_________________________________________________________________
[Wapiti] Loading model: "/opt/grobid/grobid-home/models/fulltext/model.wapiti"
Model path: /opt/grobid/grobid-home/models/fulltext/model.wapiti
[Wapiti] Loading model: "/opt/grobid/grobid-home/models/segmentation/model.wapiti"
Model path: /opt/grobid/grobid-home/models/segmentation/model.wapiti
BidLSTM_ChainCRF_FEATURES
load weights from /opt/grobid/grobid-home/models/reference-segmenter-BidLSTM_ChainCRF_FEATURES/model_weights.hdf5
loading model weights /opt/grobid/grobid-home/models/reference-segmenter-BidLSTM_ChainCRF_FEATURES/model_weights.hdf5
Model: "model_3"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 features_input (InputLayer)    [(None, None, 18)]   0           []                               

 char_input (InputLayer)        [(None, None, 30)]   0           []                               

 features_embedding_td (TimeDis  (None, None, 18, 4)  868        ['features_input[0][0]']         
 tributed)                                                                                        

 time_distributed_6 (TimeDistri  (None, None, 30, 25  4300       ['char_input[0][0]']             
 buted)                         )                                                                 

 features_embedding_td_2 (TimeD  (None, None, 8)     288         ['features_embedding_td[0][0]']  
 istributed)                                                                                      

 word_input (InputLayer)        [(None, None, 300)]  0           []                               

 time_distributed_7 (TimeDistri  (None, None, 50)    10200       ['time_distributed_6[0][0]']     
 buted)                                                                                           

 dropout_9 (Dropout)            (None, None, 8)      0           ['features_embedding_td_2[0][0]']

 concatenate_3 (Concatenate)    (None, None, 358)    0           ['word_input[0][0]',             
                                                                  'time_distributed_7[0][0]',     
                                                                  'dropout_9[0][0]']              

 dropout_10 (Dropout)           (None, None, 358)    0           ['concatenate_3[0][0]']          

 bidirectional_11 (Bidirectiona  (None, None, 200)   367200      ['dropout_10[0][0]']             
 l)                                                                                               

 dropout_11 (Dropout)           (None, None, 200)    0           ['bidirectional_11[0][0]']       

 dense_6 (Dense)                (None, None, 100)    20100       ['dropout_11[0][0]']             

 dense_7 (Dense)                (None, None, 6)      606         ['dense_6[0][0]']                

 length_input (InputLayer)      [(None, 1)]          0           []                               

 chain_crf_1 (ChainCRF)         (None, None, 6)      48          ['dense_7[0][0]']                

==================================================================================================
Total params: 403,610
Trainable params: 403,610
Non-trainable params: 0
__________________________________________________________________________________________________
[Wapiti] Loading model: "/opt/grobid/grobid-home/models/figure/model.wapiti"
Model path: /opt/grobid/grobid-home/models/figure/model.wapiti
[Wapiti] Loading model: "/opt/grobid/grobid-home/models/table/model.wapiti"
Model path: /opt/grobid/grobid-home/models/table/model.wapiti
BidLSTM_CRF_FEATURES
load weights from /opt/grobid/grobid-home/models/funding-acknowledgement-BidLSTM_CRF_FEATURES/model_weights.hdf5
loading model weights /opt/grobid/grobid-home/models/funding-acknowledgement-BidLSTM_CRF_FEATURES/model_weights.hdf5
Model: "model_4"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 features_input (InputLayer)    [(None, None, 6)]    0           []                               

 char_input (InputLayer)        [(None, None, 30)]   0           []                               

 features_embedding_td (TimeDis  (None, None, 6, 4)  292         ['features_input[0][0]']         
 tributed)                                                                                        

 time_distributed_8 (TimeDistri  (None, None, 30, 25  3700       ['char_input[0][0]']             
 buted)                         )                                                                 

 features_embedding_td_2 (TimeD  (None, None, 8)     288         ['features_embedding_td[0][0]']  
 istributed)                                                                                      

 word_input (InputLayer)        [(None, None, 300)]  0           []                               

 time_distributed_9 (TimeDistri  (None, None, 50)    10200       ['time_distributed_8[0][0]']     
 buted)                                                                                           

 dropout_12 (Dropout)           (None, None, 8)      0           ['features_embedding_td_2[0][0]']

 concatenate_4 (Concatenate)    (None, None, 358)    0           ['word_input[0][0]',             
                                                                  'time_distributed_9[0][0]',     
                                                                  'dropout_12[0][0]']             

 dropout_13 (Dropout)           (None, None, 358)    0           ['concatenate_4[0][0]']          

 bidirectional_14 (Bidirectiona  (None, None, 200)   367200      ['dropout_13[0][0]']             
 l)                                                                                               

 dropout_14 (Dropout)           (None, None, 200)    0           ['bidirectional_14[0][0]']       

 length_input (InputLayer)      [(None, 1)]          0           []                               

 dense_8 (Dense)                (None, None, 100)    20100       ['dropout_14[0][0]']             

==================================================================================================
Total params: 401,780
Trainable params: 401,780
Non-trainable params: 0
__________________________________________________________________________________________________
Model: "crf_model_wrapper_default_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 crf_2 (CRF)                 multiple                  2178      

 model_4 (Functional)        (None, None, 100)         401780    

=================================================================
Total params: 403,958
Trainable params: 403,958
Non-trainable params: 0
_________________________________________________________________

And this is the status of process resource usage when I am making a request to api/processFulltextDocument. It took 6 seconds to process an 8-page PDF. 微信图片_20240520141544 QQ截图20240520141407

lfoppiano commented 6 months ago

@gnillling I dug into it. The amount of GPU used is very punctual and if you process one file at the time it's hard to see it. With nvtop I could see the GPU was used with three files ran more or less in parallel (I did it from the interface, sending three queries of different paper in rapid succession).

I'm running it in parallel with other grobid-family applications and when the GPU memory is used it's very likely the GPU will also be used.

CONTAINER ID   IMAGE                                                                 COMMAND                  CREATED         STATUS         PORTS                                       NAMES
b40f527b7a94   grobid/grobid:0.8.0                                                   "./grobid-service/bi…"   4 minutes ago   Up 4 minutes   0.0.0.0:8070->8070/tcp, :::8070->8070/tcp   eager_elgamal

See the log:

INFO  [2024-05-21 23:39:56,600] org.eclipse.jetty.server.Server: Started Server@e784320{STARTING}[11.0.14,sto=30000] @40628ms
INFO  [2024-05-21 23:40:22,530] org.grobid.core.factory.GrobidPoolingFactory: Number of Engines in pool active/max: 1/10
INFO  [2024-05-21 23:41:44,140] org.grobid.core.factory.GrobidPoolingFactory: Number of Engines in pool active/max: 1/10
INFO  [2024-05-21 23:41:48,817] org.grobid.core.factory.GrobidPoolingFactory: Number of Engines in pool active/max: 2/10
INFO  [2024-05-21 23:41:53,328] org.grobid.core.factory.GrobidPoolingFactory: Number of Engines in pool active/max: 2/10

The GPU is being used but as you can see not much with few files:

kermitt2 / grobid

0% gpu utilization but 100% cpu utilization #1116