visheshdembla / Panorama

Highly Available, Fault Tolerant Photo Sharing App
0 stars 2 forks source link

Investigation on Mechanism for Image storage #50

Closed visheshdembla closed 3 years ago

visheshdembla commented 3 years ago

There are multiple ways in which an image can be stored for a web application.

a. Relational Database (for us, we've decided to use Postgres as our Relational DB) b. NoSQL database (Mongo) c. Filesystem based storage

a. Relational DB Here the main issue is that images are by default unstructured in nature so they cannot be stored in the tables directly. They would need to be pre-processed and converted to something like a BLOB to be stored into the Database. This is a huge overhead that would bottleneck the system if we upload images in bulk at a time

b. NoSQL MongoDB specifically uses GridFS that divides the file into smaller chunks (256kB) and stores them so that they are accessible faster. This works faster than Relational DB, but is still some overhead

c. FileSystem based This is the most efficient way of storing images, where images are stored directly as files on a file system. The image metadata and the path to the image is stored in a separate database to provide faster access to the file on the disk. Here we do not have any overhead of pre-processing. The only issue here is there is no Access Control over the data provided out of the box. Such a functionality needs to be built or be integrated using some other means.

For now, to keep things a bit simple and snappy, we would go ahead with the FileSystem based approach as it is the snappiest approach. Later on, if we come across any scalability concerns with regards to the mechanism, we might pivot to noSQL based solution.

surajp28 commented 3 years ago

Thanks, @visheshdembla for a detailed description. As per you, how convenient would it be to pivot to NoSQL if we come across the need to do this for scalability issues?

visheshdembla commented 3 years ago

Should not be a big change if we abstract the underlying logic well. I had a talk about this with our AI. We can take this forward based on some feedback from there or the instructors.

ayishacs commented 3 years ago

Inference based on team's inputs:

  1. Images will be stored on the file system rather than DB.
  2. Metadata about images such as file path and names will be stored on NoSQL database due to its speed.
  3. Extending the discussion beyond DB and Storage, we can think of having our images stored on Content Delivery Network, which delivers the images much faster with high performance throughput if it is relevant to our case.

Alternate Solution: Requesting feedback:

MongoDB would be a good choice of NoSQL database for our use case. GridFS is a specification for storing and retrieving files that exceed the document size limit of 16 MB in MongoDB. This can be viewed as an alternate option for file system based storage considering the following aspects:

  1. If the file system limits the number of files in a directory, we can use GridFS to store as many files as needed.
  2. When we want to access information from portions of large files without having to load bulk files into memory, we can use GridFS to recall sections of files without reading the entire file into memory.
  3. If we want to keep our files and metadata automatically synced and deployed across a number of systems and facilities, we can use GridFS. When using geographically distributed replica sets, MongoDB can distribute files and their metadata automatically to a number of mongod instances and facilities.
visheshdembla commented 3 years ago

Closing as we are storing metadata on DB and images on cloud storage.