openzim / zimit-frontend

Zimit Public Web UI
https://zimit.kiwix.org
GNU General Public License v3.0
7 stars 8 forks source link

Limit output size #29

Closed rgaudin closed 2 years ago

rgaudin commented 2 years ago

Some youzim.it requests can get out of hand in terms of disk space, especially when the crawler enters a dynamic-website loop. The other day, we had a blog create an 86GiB ZIM file… which disturbed the cardshop worker.

browsertrix-crawler have a --sizeLimit that we'll use once it's released (0.6). @kelson42 @Popolechien what value do you think we choose use on youzim.it? This will be in cooperation with --limit which is based on the number of pages ; so which ever runs out first.

My suggestion is 8Gi, to go along with out 1,000 pages limit.

Popolechien commented 2 years ago

I was going to say 2Gb. That's already a lot of content. Zimit may be free to use, but certainly not to run.

I'll phrase it differently though. What's the average size of a zim file containing a full feature film taken off YouTube (ca. 105min)?

kelson42 commented 2 years ago

Yes. a few Gbits

rgaudin commented 2 years ago

Just so you have all info at hand:

So in terms of cost of this feature, It's fine as long as it doesn't disturb the cardshop worker that is on the same machine. Allowing 8G or 16G vs 512M or 2G is identical from a a cost perspective

rgaudin commented 2 years ago

done. 4GiB disk, 2h