[Feature]: How to configure parameters or optimize feature extraction

emmataobao commented 2 years ago

Is there an existing issue for this?

[X] I have searched the existing issues.

Is your feature request related to a problem? Please describe.

Test 30,000 images to complete the recognition in 48 minutes. How can I improve the recognition speed, increase the server or optimize something.The time to recognize an image is between 5 seconds and 300 milliseconds。

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

fzliu commented 2 years ago

Which embedding pipeline are you using? Also, what kind of hardware (CPU/GPU) and server resources do you have?

reiase commented 2 years ago

Is it possible to give some example code? We have just released a new API that helps the users to handle large-scale datasets and utilize multi-core cpu for acceleration.

https://towhee.readthedocs.io/en/branch0.6/data_collection/get_started.html#parallel-execution

emmataobao commented 2 years ago

@fzliu @reiase towhee/image-embedding-regnetx-016 towhee=0.5.1 The current test is the cpu server. Currently, both cpu and gpu servers are available. I don't know how to improve the utilization of server resources to speed up feature extraction

reiase commented 2 years ago

@fzliu @reiase towhee/image-embedding-regnetx-016 towhee=0.5.1 The current test is the cpu server. Currently, both cpu and gpu servers are available. I don't know how to improve the utilization of server resources to speed up feature extraction

you can use towhee on a large-scale dataset with the data collection API

import towhee
embeddings = (
    towhee.dc(your_image_file_list)          # generate file list
        .image_decode()                               # decode all images
        .image_embedding.timm(model_name='resnet50')  # compute image embeddings
        .tensor_normalize()                           # embeddings normalization
        .to_list()
)
print(embeddings)

to improve cpu utilization, you can enable parallel execution by set_parallel(num_thread):

import towhee
embeddings = (
    towhee.dc(your_image_file_list)          # generate file list
        .set_parallel(5)                              # enable parallel execution with a thread pool of size 5
        .image_decode()                               # decode all images
        .image_embedding.timm(model_name='resnet50')  # compute image embeddings
        .tensor_normalize()                           # embeddings normalization
        .to_list()
)

print(embeddings)

in order to use the data collection API, you need to update to towhee 0.6 by the following command

$ pip install -U towhee

fzliu commented 2 years ago

@fzliu @reiase towhee/image-embedding-regnetx-016 towhee=0.5.1 The current test is the cpu server. Currently, both cpu and gpu servers are available. I don't know how to improve the utilization of server resources to speed up feature extraction

GPU auto-batching has already been implemented, but it hasn't been tested yet. It's in our roadmap for the next patch, i.e. 0.6.1. In the meantime, you can try @reiase's suggestion above to see if that improves performance.

towhee-io / towhee