internetarchive / liveweb

Liveweb proxy of the Wayback Machine project
https://web.archive.org/
44 stars 13 forks source link

Decide the VM Specs #10

Closed anandology closed 12 years ago

anandology commented 12 years ago

We need to find out how much CPU, RAM and disk we need for running liveweb proxy.

CPU

It looks like the new liveweb can handle a concurrency of 100 with a single process and many threads. It might be good to run two of them in parallel.

So 2 CPUs will be needed for the livewep app. It is possible for each process to use more than 1 CPU, it might be better to have a buffer.

And additional 2 CPUs for memcache, nginx and other system processes.

So 4 CPUs is minimum and 6CPUs is optimal.

Memory

The plan is to store the url md5 and the arc file location in memcache.

We get about 100 requests/sec. That is 8.4M/day. Assuming half of them are duplicates, we have about 4M unique hit/day.

Assuming we want to keep this in memcache for 24 hours, it will require about 2GB of memory.

2GB = 4M * 512 bytes per record (32 bytes for key, 200 bytes for value and 300 bytes overhead).

The app is not going to use more than 512MB of memory.

We may want to keep some memory for OS buffer cache.

Looks like 4GB/6GB is good enough.

If we want to keep the records in memcache/varnish, we need to consider that.

Disk

Assuming the liveweb is receiving 100 req/sec and it is following the same distribution, it fill 30MB/sec (100GB/hour, 2.4T/day).

The current liveweb has about 6.7TB of disk. Something of the same order will be good for the new one.

Recommendation

6 core CPU 6GB RAM 6TB disk

samuel-archive commented 12 years ago

The plan is to performance test liveweb-proxy withe real logs from the liveweb proxy. Probably with siege. Then get an idea about what the footprint / growth needs are. I'm hoping We can just squeeze it onto wwwb-gen1.

samuel-archive commented 12 years ago

Where in the system is the memcache use? I didn't see it in the github code so far....

anandology commented 12 years ago

Where in the system is the memcache use? I didn't see it in the github code so far....

We've are storing it in the memory right now. Plan is to switch to memcache.

anandology commented 12 years ago

Closing this as we are planning to use wwwb-gen1.

samuel-archive commented 12 years ago

Actually,

we are going to use the vm wwwb-liveweb

-sam

Closing this as we are planning to use wwwb-gen1.


Reply to this email directly or view it on GitHub: https://github.com/internetarchive/liveweb/issues/10#issuecomment-5594067

Samuel Stoller samuel@archive.org mobile: +1 415 425 7739 skype: samuel-archive