yahoojapan / ngtd

Serving NGT over HTTP or gRPC ※This project is not maintained. We have moved to a new product, [Vald](https://vald.vdaas.org) .
Apache License 2.0
38 stars 10 forks source link

Use Batch to speed up bulk insertion operations #41

Closed tomberek closed 4 years ago

kpango commented 4 years ago

Thank you for your contribution. Looks good to me, are you using ngtd for your product?

Recently we doesn't maintain this repository YahooJapan's vector search engine platform was moved to https://github.com/vdaas/vald. Please check it out.

tomberek commented 4 years ago

As far as I understand vald still uses NGT as an underlying library. To meet our needs we rewrote NGTD, this was just a bug we found while trying to do live insertions and queries. We are hesitant to move to vald due to the need for K8s, but I'd be curious to understand how/why vald would be better. We're also in touch with @masajiro regarding some of the latest additions to the NGT public interface.

kpango commented 4 years ago

@tomberek As you know, vald uses NGT as a core library because NGT is excellent. and we're planning to support hnswlib and faiss. But NGTD was not enough for our workload. I created NGTD as a hobby project not for production-grade. There is not enough test code, many bugs, and the biggest problem is scalability. For storing large amounts of data, you need to use a server with high memory specifications, but you can use virtually any kind of Kubernetes node, memory can be shared over the network, and vectors can be easily backed up to external storage. And there is another advantage we don't need to care about NGT create indexing vald has auto-indexing feature create NGT index will be controlled by vald manager component.

kpango commented 4 years ago

@tomberek You have another option. Vald is built on top of microservices, the core search component is called Agent-NGT, and you can use it alone instead of NGTD. Docker image is below https://hub.docker.com/r/vdaas/vald-agent-ngt and example, configuration is below https://github.com/vdaas/vald/blob/master/cmd/agent/ngt/sample.yaml, but this configuration is not enough for production usage. If you'd like to customize more about the agent, you can see the configuration document (not only agent, but you can check each param) below. https://github.com/vdaas/vald/tree/master/charts/vald

kpango commented 4 years ago

We can support the introduction of Vald. Tell us more about your workload and requirements. Any feedback is welcome.