towhee-io / towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
https://towhee.io
Apache License 2.0
3.17k stars 246 forks source link

[Feature]: How to configure parameters or optimize feature extraction #1059

Closed emmataobao closed 2 years ago

emmataobao commented 2 years ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Test 30,000 images to complete the recognition in 48 minutes. How can I improve the recognition speed, increase the server or optimize something.The time to recognize an image is between 5 seconds and 300 milliseconds。

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

fzliu commented 2 years ago

Which embedding pipeline are you using? Also, what kind of hardware (CPU/GPU) and server resources do you have?

reiase commented 2 years ago

Is it possible to give some example code? We have just released a new API that helps the users to handle large-scale datasets and utilize multi-core cpu for acceleration.

https://towhee.readthedocs.io/en/branch0.6/data_collection/get_started.html#parallel-execution

emmataobao commented 2 years ago

@fzliu @reiase towhee/image-embedding-regnetx-016 towhee=0.5.1 image The current test is the cpu server. Currently, both cpu and gpu servers are available. I don't know how to improve the utilization of server resources to speed up feature extraction

reiase commented 2 years ago

@fzliu @reiase towhee/image-embedding-regnetx-016 towhee=0.5.1 image The current test is the cpu server. Currently, both cpu and gpu servers are available. I don't know how to improve the utilization of server resources to speed up feature extraction

you can use towhee on a large-scale dataset with the data collection API

import towhee
embeddings = (
    towhee.dc(your_image_file_list)          # generate file list
        .image_decode()                               # decode all images
        .image_embedding.timm(model_name='resnet50')  # compute image embeddings
        .tensor_normalize()                           # embeddings normalization
        .to_list()
)
print(embeddings)

to improve cpu utilization, you can enable parallel execution by set_parallel(num_thread):

import towhee
embeddings = (
    towhee.dc(your_image_file_list)          # generate file list
        .set_parallel(5)                              # enable parallel execution with a thread pool of size 5
        .image_decode()                               # decode all images
        .image_embedding.timm(model_name='resnet50')  # compute image embeddings
        .tensor_normalize()                           # embeddings normalization
        .to_list()
)

print(embeddings)

in order to use the data collection API, you need to update to towhee 0.6 by the following command

$ pip install -U towhee
fzliu commented 2 years ago

@fzliu @reiase towhee/image-embedding-regnetx-016 towhee=0.5.1 image The current test is the cpu server. Currently, both cpu and gpu servers are available. I don't know how to improve the utilization of server resources to speed up feature extraction

GPU auto-batching has already been implemented, but it hasn't been tested yet. It's in our roadmap for the next patch, i.e. 0.6.1. In the meantime, you can try @reiase's suggestion above to see if that improves performance.