Closed emmataobao closed 2 years ago
Which embedding pipeline are you using? Also, what kind of hardware (CPU/GPU) and server resources do you have?
Is it possible to give some example code? We have just released a new API that helps the users to handle large-scale datasets and utilize multi-core cpu for acceleration.
https://towhee.readthedocs.io/en/branch0.6/data_collection/get_started.html#parallel-execution
@fzliu @reiase towhee/image-embedding-regnetx-016 towhee=0.5.1 The current test is the cpu server. Currently, both cpu and gpu servers are available. I don't know how to improve the utilization of server resources to speed up feature extraction
@fzliu @reiase towhee/image-embedding-regnetx-016 towhee=0.5.1 The current test is the cpu server. Currently, both cpu and gpu servers are available. I don't know how to improve the utilization of server resources to speed up feature extraction
you can use towhee on a large-scale dataset with the data collection API
import towhee
embeddings = (
towhee.dc(your_image_file_list) # generate file list
.image_decode() # decode all images
.image_embedding.timm(model_name='resnet50') # compute image embeddings
.tensor_normalize() # embeddings normalization
.to_list()
)
print(embeddings)
to improve cpu utilization, you can enable parallel execution by set_parallel(num_thread)
:
import towhee
embeddings = (
towhee.dc(your_image_file_list) # generate file list
.set_parallel(5) # enable parallel execution with a thread pool of size 5
.image_decode() # decode all images
.image_embedding.timm(model_name='resnet50') # compute image embeddings
.tensor_normalize() # embeddings normalization
.to_list()
)
print(embeddings)
in order to use the data collection API, you need to update to towhee 0.6 by the following command
$ pip install -U towhee
@fzliu @reiase towhee/image-embedding-regnetx-016 towhee=0.5.1 The current test is the cpu server. Currently, both cpu and gpu servers are available. I don't know how to improve the utilization of server resources to speed up feature extraction
GPU auto-batching has already been implemented, but it hasn't been tested yet. It's in our roadmap for the next patch, i.e. 0.6.1. In the meantime, you can try @reiase's suggestion above to see if that improves performance.
Is there an existing issue for this?
Is your feature request related to a problem? Please describe.
Test 30,000 images to complete the recognition in 48 minutes. How can I improve the recognition speed, increase the server or optimize something.The time to recognize an image is between 5 seconds and 300 milliseconds。
Describe the solution you'd like.
No response
Describe an alternate solution.
No response
Anything else? (Additional Context)
No response