Closed visheshdembla closed 3 years ago
Thanks, @visheshdembla for a detailed description. As per you, how convenient would it be to pivot to NoSQL if we come across the need to do this for scalability issues?
Should not be a big change if we abstract the underlying logic well. I had a talk about this with our AI. We can take this forward based on some feedback from there or the instructors.
Inference based on team's inputs:
Alternate Solution: Requesting feedback:
MongoDB would be a good choice of NoSQL database for our use case. GridFS is a specification for storing and retrieving files that exceed the document size limit of 16 MB in MongoDB. This can be viewed as an alternate option for file system based storage considering the following aspects:
Closing as we are storing metadata on DB and images on cloud storage.
There are multiple ways in which an image can be stored for a web application.
a. Relational Database (for us, we've decided to use Postgres as our Relational DB) b. NoSQL database (Mongo) c. Filesystem based storage
a. Relational DB Here the main issue is that images are by default unstructured in nature so they cannot be stored in the tables directly. They would need to be pre-processed and converted to something like a BLOB to be stored into the Database. This is a huge overhead that would bottleneck the system if we upload images in bulk at a time
b. NoSQL MongoDB specifically uses GridFS that divides the file into smaller chunks (256kB) and stores them so that they are accessible faster. This works faster than Relational DB, but is still some overhead
c. FileSystem based This is the most efficient way of storing images, where images are stored directly as files on a file system. The image metadata and the path to the image is stored in a separate database to provide faster access to the file on the disk. Here we do not have any overhead of pre-processing. The only issue here is there is no Access Control over the data provided out of the box. Such a functionality needs to be built or be integrated using some other means.
For now, to keep things a bit simple and snappy, we would go ahead with the FileSystem based approach as it is the snappiest approach. Later on, if we come across any scalability concerns with regards to the mechanism, we might pivot to noSQL based solution.