model.predict time stdout

wortelus commented 1 month ago

This is rather a question regarding support as I am yet to find an answer.

model.predict() outputs a time to stdout when calling on Keras model. For a given step i am given a time in milliseconds.

May I ask what the time includes? Does it include only pure forward pass through network, or is data copying to the GPU's DRAM or are some state initializations also included?

Thank you

fchollet commented 1 month ago

The first step (or two steps sometimes) cover compilation time too (which is why they tend to take much longer). The other steps cover a full forward pass, which includes updating the variables.

wortelus commented 1 month ago

Hey there, thanks for the quick response!

By first step you mean calling the model.predict for the first time (or second, as you said) in the given session? Hence making the subsequent calls being only the forward pass through the network? In both cases, the time reported in the stdout are covering only said phases? Is the data transfer between memory and GPU also included?

Sorry if I am misunderstanding this.

Warm regards, wortelus

fchollet commented 1 month ago

By first step you mean calling the model.predict for the first time (or second, as you said) in the given session?

predict() (and fit, evaluate) are loops. The inner content of that loop is called a "step". One call to predict() might have, say, 1000 steps. Compilation only happens at the first step, the first time predict() is called. Subsequent steps and subsequent calls to predict() reuse the same compiled graph.

Hence making the subsequent calls being only the forward pass through the network? In both cases, the time reported in the stdout are covering only said phases? Is the data transfer between memory and GPU also included?

Yes.

wortelus commented 1 month ago

I now understand that the time I get to stdout is part of Keras callback using, specifically Progbar class, with its methods that are executed before and after the main loop, as seen here from trainer.py

self.make_predict_function()
self.stop_predicting = False
callbacks.on_predict_begin()
outputs = None
with epoch_iterator.catch_stop_iteration():
    for step, iterator in epoch_iterator.enumerate_epoch():
        callbacks.on_predict_batch_begin(step)
        data = get_data(iterator)
        batch_outputs = self.predict_function(data)
        outputs = append_to_outputs(batch_outputs, outputs)
        callbacks.on_predict_batch_end(step, {"outputs": batch_outputs})
        if self.stop_predicting:
            break
callbacks.on_predict_end()
outputs = tree.map_structure_up_to(
    batch_outputs, potentially_ragged_concat, outputs
)
return tree.map_structure(convert_to_np_if_not_ragged, outputs)

I suppose the compilation happens under self.make_predict_function().

I now understand the source of the time, as the time is managed under callbacks.on_predict_begin() and callbacks.on_predict_end(), respectively. My question is then at what point in the predict() is the data loaded into the GPU, so I can know if the time within the progress bar includes the IO GPU<->memory operations or if (some) data are already present on the GPU beforehand.

Kind regards, wortelus

keras-team / keras

model.predict time stdout #19778