kevaldekivadiya2415 / textembed

TextEmbed is a REST API crafted for high-throughput and low-latency embedding inference. It accommodates a wide variety of embedding models and frameworks, making it ideal for various natural language processing applications.
Apache License 2.0
17 stars 4 forks source link

I can't find the documentation. Would love to use it for my research! #18

Open sleepingcat4 opened 2 months ago

sleepingcat4 commented 2 months ago

I can't find the documentation and Setup.md files returns a 404 error. is there docs available?

kevaldekivadiya2415 commented 2 months ago

I am working on the documentation. Here is the Setup.md file.

kevaldekivadiya2415 commented 2 months ago

I’ve set up the GitHub Pages site for the documentation. You can access it here. Please check if the documentation is useful and let me know if there are any issues or improvements needed.

sleepingcat4 commented 2 months ago

@kevaldekivadiya2415 the documentation is too simple and does not talk about workers and batches. Besides batches are tricky when you are sending huge payload to the model and followed by 3/4 more operations (small yet uses workers).

Then I would love if it did a benckmark between TEI library from HF and if it had Intel Gaudi hardware support.

kevaldekivadiya2415 commented 2 months ago

--workers: This argument specifies the number of worker processes to be used for batch processing. Increasing the number of workers allows the system to handle multiple batches in parallel, improving throughput, especially when processing a high volume of requests.

--batch_size: This parameter defines the size of each batch for processing requests.

sleepingcat4 commented 2 months ago

@kevaldekivadiya2415 can you provide a benchmark on 5000 rows using batch process? I tried batch processing before with TEI Gaudi HF repo and it was a disaster. It was easier to do sequentially since I could reach 8000 rows in 27 seconds.

I have super computers and limited amounts of smalls sized computers. If you could give me a ballpark what's the throughput at a given time for an 1 page worth payload and storing the results on RAM, I would be eager to checkout your library.