Open Sam-Persoon opened 4 years ago
@Sam-Persoon, In order to expedite the trouble-shooting process, could you please provide the complete code to reproduce the issue reported here. Thanks!
This is just an untrained example model from the TF docs, but it has the issue:
from tensorflow.keras import layers
from tensorflow import keras
import timeit
num_tags = 12 # Number of unique issue tags
num_words = 10000 # Size of vocabulary obtained when preprocessing text data
num_departments = 4 # Number of departments for predictions
title_input = keras.Input(shape=(None,), name='title') # Variable-length sequence of ints
body_input = keras.Input(shape=(None,), name='body') # Variable-length sequence of ints
tags_input = keras.Input(shape=(num_tags,), name='tags') # Binary vectors of size `num_tags`
# Embed each word in the title into a 64-dimensional vector
title_features = layers.Embedding(num_words, 64)(title_input)
# Embed each word in the text into a 64-dimensional vector
body_features = layers.Embedding(num_words, 64)(body_input)
# Reduce sequence of embedded words in the title into a single 128-dimensional vector
title_features = layers.LSTM(128)(title_features)
# Reduce sequence of embedded words in the body into a single 32-dimensional vector
body_features = layers.LSTM(32)(body_features)
# Merge all available features into a single large vector via concatenation
x = layers.concatenate([title_features, body_features, tags_input])
# Stick a logistic regression for priority prediction on top of the features
priority_pred = layers.Dense(1, activation='sigmoid', name='priority')(x)
# Stick a department classifier on top of the features
department_pred = layers.Dense(num_departments, activation='softmax', name='department')(x)
# Instantiate an end-to-end model predicting both priority and department
model = keras.Model(inputs=[title_input, body_input, tags_input],
outputs=[priority_pred, department_pred])
model.compile(optimizer='adam', loss='mae')
model.save('test')
# --------
import_module = "import tensorflow as tf"
code = "tf.saved_model.load('test')"
print(timeit.timeit(stmt=code, setup=import_module, number=10))
# Upload the test model to a gcs bucket you can access
code_gcs = "tf.saved_model.load('gs://your-bucket/test')"
print(timeit.timeit(stmt=code_gcs, setup=import_module, number=10))
There are two different filesystems involved. Can you try to post the times for the following code
local_filename = "/path/to/some/local/file"
gcs_filename = "gs://bucket/and/path/to/file"
size = 1024
filename = local_filename # or gcs_filename
with tf.io.gfile.GFile(filename) as f:
f.read(size)
Ensure that the file has at least size
bits, keep increasing the size (1KB, 2KB, 4KB, ... 1MB, 1GB) and compare the two times (maybe plot them)?
The following code is used for the plot
# --------
import_module = "import tensorflow as tf"
sizes = [1024*i for i in [1, 2, 4, 8, 50, 100, 500, 1000]]
local_times = []
gcs_times = []
for size in sizes:
print(size)
code = f"""
filename = 'test/saved_model.pb'
with tf.io.gfile.GFile(filename, 'rb') as f:
f.read({size})
"""
local_times.append(timeit.timeit(stmt=code, setup=import_module, number=10))
code_gcs = f"""
filename = 'gs://bucket/test/saved_model.pb'
with tf.io.gfile.GFile(filename, 'rb') as f:
f.read({size})
"""
gcs_times.append(timeit.timeit(stmt=code_gcs, setup=import_module, number=10))
Mind the log scales in the plot
This seems reasonable as the difference between local and remote access over the network.
Loading 1 kb in 6s is really slow though. I tested the impact of sequentially loading the models to check if maybe the authentication to gcs impacts things. The following code:
import tensorflow as tf
import time
from google.cloud import storage
import os
model_names = ['test1', 'test2', 'test3', 'test4']
storage_client = storage.Client()
bucket = storage_client.get_bucket('bucket-name')
for model in model_names:
print(model)
start_time = time.time()
tf.saved_model.load(f'gs://bucket-name/models/{model}/1')
elapsed_time = time.time() - start_time
print(f'Loading from gcs: {elapsed_time}')
start_time = time.time()
blobs = bucket.list_blobs(prefix=f'models/{model}/1') # Get list of files
folder = f'./download_folder'
for blob in blobs:
// create dirs if not exist, code omitted
blob.download_to_filename(folder + '/' + blob.name)
// time local disk loading
start_time_internal = time.time()
tf.saved_model.load(os.path.join(folder, 'models', model, '1' ))
elapsed_time_internal = time.time() - start_time_internal
print(f'Local disk: {elapsed_time_internal}')
elapsed_time = time.time() - start_time
print(f'Downloading from gcs, then loading: {elapsed_time}')
Gives the following outputs:
test1
Loading from gcs: 32.72501277923584
Local disk: 2.4520139694213867
Downloading from gcs, then loading: 4.770485162734985
test2
Loading from gcs: 0.9281768798828125
Local disk: 0.013478279113769531
Downloading from gcs, then loading: 0.45862293243408203
test3
Loading from gcs: 57.323729038238525
Local disk: 1.7698051929473877
Downloading from gcs, then loading: 3.2642600536346436
test4
Loading from gcs: 3.6672658920288086
Local disk: 0.053382158279418945
Downloading from gcs, then loading: 0.8592350482940674
test1 and test3 are larger models, test2 and 4 are smaller. Large models really suffer from this.
Can you compare with downloading a file from GCS from command-line? (That is, not using TensorFlow)
Single-threaded downloading the files gives the following output:
test1
Loading from gcs: 29.73722219467163
Loading from local disk: 2.546816825866699
Downloading from gcs, then loading from local disk: 3.7411539554595947
Downloading from gcs using cli, then loading from local disk: 9.842350006103516
test2
Loading from gcs: 0.721973180770874
Loading from local disk: 0.01868605613708496
Downloading from gcs, then loading from local disk: 0.31568098068237305
Downloading from gcs using cli, then loading from local disk: 3.2689220905303955
test3
Loading from gcs: 49.419824838638306
Loading from local disk: 1.8356938362121582
Downloading from gcs, then loading from local disk: 2.639312982559204
Downloading from gcs using cli, then loading from local disk: 11.165163040161133
test4
Loading from gcs: 2.7186670303344727
Loading from local disk: 0.054453134536743164
Downloading from gcs, then loading from local disk: 0.8785662651062012
Downloading from gcs using cli, then loading from local disk: 3.097564935684204
Any updates on this?
I think the issue is that you don't factor in the time spent on network operations. To me, it looks consistent
I think the issue is that you don't factor in the time spent on network operations. To me, it looks consistent
I don't see how I am not factoring in these. If I adapt the script posted above to:
import tensorflow as tf
import time
from google.cloud import storage
import os
import subprocess
total_time = time.time()
model_names = ['test1', 'test2', 'test3', 'test4']
for model in model_names:
start_time = time.time()
tf.saved_model.load(f'gs://bucket-name/models/{model}/1')
elapsed_time = time.time() - start_time
print(f'Loading from gcs: {elapsed_time}')
total_elapsed_time = time.time() - total_time
print(f"Total elapsed time: {total_elapsed_time}")
I get the following output:
Loading from gcs: 28.524829864501953
Loading from gcs: 0.7831361293792725
Loading from gcs: 52.6575129032135
Loading from gcs: 3.1442267894744873
Total elapsed time: 85.57314991950989
And downloading them first and then loading:
import tensorflow as tf
import time
from google.cloud import storage
import os
import subprocess
total_time = time.time()
model_names = ['test1', 'test2', 'test3', 'test4']
storage_client = storage.Client()
bucket = storage_client.get_bucket('bucket-name')
for model in model_names:
start_time = time.time()
blobs = bucket.list_blobs(prefix=f'models/{model}/1') # Get list of files
folder = f'./download_folder'
for blob in blobs:
# create folder if not exist
blob.download_to_filename(folder + '/' + blob.name)
start_time_internal = time.time()
tf.saved_model.load(os.path.join(folder, 'models', model, '1' ))
elapsed_time_internal = time.time() - start_time_internal
print(f'Loading from local disk: {elapsed_time_internal}')
elapsed_time = time.time() - start_time
print(f'Downloading from gcs, then loading from local disk: {elapsed_time}')
)
total_elapsed_time = time.time() - total_time
print(f"Total elapsed time: {total_elapsed_time}")
Gives following output
Loading from local disk: 2.6329727172851562
Downloading from gcs, then loading from local disk: 4.476565837860107
Loading from local disk: 0.025137901306152344
Downloading from gcs, then loading from local disk: 0.4631459712982178
Loading from local disk: 1.6898088455200195
Downloading from gcs, then loading from local disk: 3.088888168334961
Loading from local disk: 0.04868316650390625
Downloading from gcs, then loading from local disk: 1.0312209129333496
Total elapsed time: 9.417081117630005
I timed everything now, so network operations are in here. I can execute both scripts multiple times, the execution times are always in line with these results.
Thanks. I'll get a look into this, though not this week or the next (loaded with other work items).
Thanks. I'll get a look into this, though not this week or the next (loaded with other work items).
Thanks.
As a month passed since the last comment, any updates on this?
Sorry, but unfortunately due to COVID my work schedule became more chaotic. So it will still take some time. Apologies for this
Having a same issue here, except that I'm loading a big SavedModel and it's extremely slow. Downloading the same model with gsutil is much faster so it rules out the problem of the network.
The filesystem glob matching implementation has recently changed. Let's test with the 2.4 release when it gets out to see if this is made better
TF 2.4 has been released. Is this still an issue?
Please reopen if this is still an issue in 2.4 release.
Hi @mihaimaruseac - I can confirm this is still an issue in TF2.4
This is the code I used to test this issue:
import sys
import time
import tensorflow as tf
def main(argv):
print(tf.__version__)
tic = time.time()
tf.saved_model.load(argv[0])
toc = time.time()
print(f'Local load time: {toc - tic:.2f}s')
tic = time.time()
tf.saved_model.load(argv[1])
toc = time.time()
print(f'Remote load time: {toc - tic:.2f}s')
if __name__ == '__main__':
main(sys.argv[1:])
I copied the SavedModel directory I'm loading from GCS to local on a Google Cloud Virtual Machine. The directory is around 170MB in size and copying takes less than 2 seconds.
Test result on TF 2.2.2:
hao.hu@derekhh-dev /tmp
tmp_py37 ❯ python temp.py /tmp/202102051600 gs://foo/202102051600/
2.2.2
Local load time: 20.78s
Remote load time: 173.86s
Test result on TF 2.3.2:
hao.hu@derekhh-dev /tmp
tmp_py37 ❯ python temp.py /tmp/202102051600 gs://foo/202102051600/
2.3.2
Local load time: 20.98s
Remote load time: 180.84s
Test result on TF 2.4.1:
hao.hu@derekhh-dev /tmp
tmp_py37 ❯ python temp.py /tmp/202102051600 gs://foo/202102051600/
2.4.1
Local load time: 20.63s
Remote load time: 195.49s
Thank you for confirming.
@mihaimaruseac Thanks for investigating. I created a quick profile with TF 2.4 and most of the time is actually spent in the following function call mainly due to the execution of the restore_v2
operation, so I am not sure if the root cause is actually related to the TF file system implementation:
https://github.com/tensorflow/tensorflow/blob/7a1b65b61397dbad1d400e1d3c7079246dfe309a/tensorflow/python/saved_model/load.py#L161
@mihaimaruseac did you have any chance to look into this issue?
Reloading models (even small ones) from GCS seems almost unusable even with a super fast network connection so unfortunately the only workaround currently is to download the saved model (which tends to be super fast) and read it from the local disk which can be quite cumbersome for users.
Unfortunately, I haven't :( I'm working on the security issues at the moment and putting everything else on a lower priority track as there is not enough time to work on everything :(
Was there any resolution to this or a work around found?
Hi,
Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.
The Tensorflow team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TensorFlow version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.
Please follow the release notes to stay up to date with the latest developments which are happening in the Tensorflow space.
This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.
This is still an issue
System information
Describe the current behavior
I want to use Tensorflow serving to load multiple Models. If I mount a directory containing the model, loading everything is done in an instant, while loading them from a
gs://
path takes around 10 seconds per model.While researching the issue I discovered this is probably a Tensorflow issue and not a Tensorflow Serving issue as loading them in Tensorflow is a huge difference as well:
Then it could be that downloading the model (which is very small) is slow, but I tested this as well:
Any idea what is happening here?
Describe the expected behavior Around the same load time for external vs local models. The first load from gs can be slow if the auth needs to happen, but even authenticating is still way faster than the models loads.