rubygems / bundler-site

The Bundler documentation website
https://bundler.io
110 stars 205 forks source link

Evaluate Typesense cloud (poc) #731

Open tnir opened 2 years ago

tnir commented 2 years ago

cf. #691

TODO / concerns

Backend / cost

Frontend

DocSearch v3 is current while Typesense's one was still on v2: https://github.com/rubygems/bundler-site/issues/691#issuecomment-1192825277

Resources

tnir commented 2 years ago

Loopin @jasonbosco @deivid-rodriguez @simi 👋

simi commented 2 years ago

First of all, what's problem with middleman-search? Are there any problems we do face, anything we can fix in there?

simi commented 2 years ago

@jasonbosco would it be possible to provide some kind of sponsored account for RubyGems? We can make some attribution in the footer or at special page, but we would like to prevent branding of the search box itself.

deivid-rodriguez commented 2 years ago

middleman-search has been abandoned, and needs updates (we're already using my fork from git at the moment). So the problem with it is that we would need to take over maintenance ourselves.

tnir commented 2 years ago

[...] what's problem [...]

Yes. The Bundler.io Website team needs to maintain our search.js and search_arrow.js by themselves with that abandoned middleman-search and its dependency of the legacy version lunr.js, which was already stated in the description in #691.

tnir commented 2 years ago

In my previous experiment on Typesense cloud even within the nearest region to me (without Search Delivery Network) this week, performance should be improved. @jasonbosco Per performance evaluation, can I use 8GB, 2vCPU (non-burst type) with Search Delivery Network for this purpose?

Cost would be like:

Cluster
$1.45 /hr
Works out to $1,044.00 /month
tnir commented 2 years ago

I would say I already completed the logo problem even in #706.

simi commented 2 years ago

@tnir would you mind to open related issues at https://github.com/manastech/middleman-search? I can try to address them.

deivid-rodriguez commented 2 years ago

I have to say I'm more and more convinced that a local search solution would be best, it feels overkill to use an external service to scrape such a simple site like ours. Should we take over middleman-search maintenance? It has always worked very well and as far as I understood, we only need https://github.com/manastech/middleman-search/pull/29 and https://github.com/manastech/middleman-search/pull/38 (and we are already using https://github.com/manastech/middleman-search/pull/38).

jasonbosco commented 2 years ago

@jasonbosco would it be possible to provide some kind of sponsored account for RubyGems? We can make some attribution in the footer or at special page, but we would like to prevent branding of the search box itself.

@simi Happy to provide a sponsored account for Rubygems, as long as the cost is not too high (since we ourselves are a bootstrapped company).

Although if we sponsor it, I would like to ask for the powered by logo to be shown in the search results. This is very similar to Algolia's ask as well when using their DocSearch version:

We know that paying for search infrastructure is a cost not all open source projects can afford. That's why we decided to keep DocSearch free for everyone. All we ask in exchange is that you keep the "Search by Algolia" logo displayed next to the search results. Source: https://docsearch.algolia.com/docs/docsearch-program/#how-much-does-it-cost


@jasonbosco Per performance evaluation, can I use 8GB, 2vCPU (non-burst type) with Search Delivery Network for this purpose?

@tnir I don't think you would need 8GB of RAM to index the content from bundler.io (assuming that's the scope of this issue).

It seems like the number of pages is ~50 (please correct me if I'm wrong), so you might be able to fit all of this and much more in 512MB of RAM. With a 5 region SDN, you're looking at this configuration: https://cloud.typesense.org/pricing?memory=0.5_gb&vcpu=2_vcpus_1_hr_burst_per_day&high_perf_disk=no&typesense_server_version=0.23.1&ha=yes&sdn=5_regions&regions=n_california%2Cohio%2Cfrankfurt%2Cmumbai%2Ctokyo

~$110 a month, plus bandwidth.

The number of concurrent searches per second will determine the amount of vCPU you need, but then with 5 nodes you actually get 5 * 2vCPUs per node = 10vCPUs total. And for your dataset, this should be sufficient as well.

The best way to determine RAM usage would be to run the typesense-docsearch-scraper against the site and index it into Typesense to observe memory usage.

If my estimates above hold good, happy to sponsor this cluster for you.

tnir commented 2 years ago

The number of the pages are 950-1000, but each of most is not huge. I just guess Burst-type on Typesense looks one of the reasons of slowness on Typesensse cluod. As I am not sure what kind of technology you use there at all, I defer you about vCPUs per Node 💪 . As you said above, the search traffic would be very low I guess. Let me start with the minimal 0.5GB mem. (note that in my previous experiment with 1/3 data in volume of production, Typesense used 50-60MB in memory, so even if we improved indexing, 0.5GB mem would be enough.)

@jasonbosco Then can I ask you to launch a single cluster (0.5GB-mem HA-5SDN cluster (SDN in 2US, 1EU, 2APAC as you suggested)) in bundler-io project?

simi commented 2 years ago

Although if we sponsor it, I would like to ask for the powered by logo to be shown in the search results. This is very similar to Algolia's ask as well when using their DocSearch version:

I'm afraid that's exactly what I'm trying to avoid and AFAIK there is no easy way to get paid account for us. Let's focus on middleman-search and its maintenance @tnir.

tnir commented 2 years ago

Although if we sponsor it, I would like to ask for the powered by logo to be shown in the search results. This is very similar to Algolia's ask as well when using their DocSearch version:

Oops, I did not read this. If so, I prefer #706 now...

tnir commented 2 years ago

No, again #706 completed all requirements you (and I) want, so it seems that https://bundler-site-tnir-algol-j2zyh5.herokuapp.com/ might be the best at this moment.

tnir commented 2 years ago

Before considering if putting the logo, we need to check if search experience is good with Typesense cloud. Once @jasonbosco create (or allow me to create) a cluster, I do update #702 in minutes.

jasonbosco commented 2 years ago

@tnir I've added some credits to the bundler-io account on Typesense Cloud. If you switch to it, you should now be able to provision a 5-region SDN cluster with the config I mentioned above.

hsbt commented 2 years ago

I agreed @simi's opinion.

SaaS for OSS project is difficult. I have a lot of experience that are abandoned repositories. Sometimes I got a unknown cost charge, migrate it to heroku or AWS, and maintain them.

At least, We should choose technical stack that can migrate the OSS altanatives.