Closed keerthys closed 10 years ago
Commonly suggested approaches.
As a initial step I am thinking of going with the first approach. Once the prototype is done. We have to compare the performance using RDBMS and No SQL environment as well to come to a final conclusion on the approach. Hadoop for video streaming also needs to be validated before coming to a conclusion.
What do you feel? - Refer last comment for further updates on final conclusion.
Supporting links for the above points: ( For future reference when we revisit the post) http://www.viiratech.com/tutorials/good-programming-practice/storing-media-files-database-file-system.html http://stackoverflow.com/questions/154707/what-is-the-best-way-to-store-media-files-on-a-database http://blog.mongodb.org/post/183689081/storing-large-objects-and-files-in-mongodb
Superb link. Handles all questions: Final conclusion: Mysql it is. http://akashkava.com/blog/127/huge-file-storage-in-database-instead-of-file-system/
Add your views if any keerthy.
Nice investigation and a great start with regard to various options we have. But still we cannot settle upon our decision to mysql IMO.
When we do a math as follows, Total number of producers * average number of videos uploaded * size of each content 1 lakh producers - 100000 * 10 * 250 MB ~ 230 TB 10K producers ~ 23 TB
Also if we replicate the DB content, then it will double itself which will again increase the size.
I have considered 1 lakh users in the above calculation. Of course it will take lot of time to scale to that level. But these are hard decisions to change at later point. So we should do careful analysis supporting data point and rough estimates of data requirement,
Since we are allowing video content size can tremendously grow,so we should do a better estimate in this front. We should validate our data requirements and whether each approach could cater to that need.
We can have another task that evaluates the various options with the rough estimate of numbers.
There are only 2 variable factor here.. 1. Individual file size 2. Total no of files.. Individual file size is not a problem as blob needs to be split as fixed smaller chunks while storing in db.. With respect to total number of files. More the number of files more the db size.. One way to deal this problem is partitioning data based on the trend/ upload date.. And to deal with scalability issues of huge data growth.. We can move to MySQL cloud which supports multiple redundant replications across multiple locations In the later phase and should cater to our needs.. Feel free to add your views..
I don't have prior hands on experience with DB, so I don't have any specific suggestion at this moment. Based on your investigation, mysql seems way to go. We can start prototyping with mysql and can measure performance (by many parallel connection to fetch a video).
First point to be ensured is