interlynk-io / sbomqs

SBOM quality score - Quality metrics for your sboms
Apache License 2.0
158 stars 20 forks source link

Add a optimized database #271

Closed viveksahu26 closed 1 month ago

viveksahu26 commented 2 months ago

Closes issue: #265 This pull request introduces an optimized version of the db package to improve the performance of our database operations. As well as, benchmark tests been added to measure the performance and efficiency of the new implementation.

Changes modifies are:

Storing records in map is highly optimized version at the retrieval time, which result into O(1) time complexity. Whereas, original db uses loops to retrieve records and that results into O(n) time complexity.

Use of Mutex concept:

Apart from that added a benchmark test, to measure the performance of key database operations::

~Implementation of goroutines for parallel tasking:~ ~- Goroutines are functions or methods that run concurrently with other functions or methods. They are created using the go keyword. Whereas channels are used to send and receive values, allowing goroutines to synchronize their operations.~ ~sync.RWMutex will make ensure that concurrent access to the database is handled correctly.~

riteshnoronha commented 2 months ago

@viveksahu26 same here is this ready for review

viveksahu26 commented 1 month ago

@viveksahu26 same here is this ready for review

Yeah, this one too.

riteshnoronha commented 1 month ago

@viveksahu26 this appears to be too complex. Lets discuss this

viveksahu26 commented 1 month ago

Yeah sure @riteshnoronha !! So, basically originally we were fetching records from database in 3 ways:

For example, to get all records for particular Key, we had to loop over all records and check each record whether it contain that Key or not, If contains then append it to final list of records containing that Key and return that list as the loop ends. Similarly we perform operation to get records for Key as well similalrly for Key and ID. So, if we conclude it, we can see that at the end we return all the values(i.e. records) for a particular Key.

In the new changes, we have simplified it to Map data structure. As it has the functionality to store all values for a particular key. So, earlier when we were adding any record we were simply appending to the list of records.

func (d *db) addRecord(r *record) {
    d.records = append(d.records, r)
}

But, when we are adding any record, we are adding it in 3 ways:

// addRecord adds a single record to the database
func (d *db) addRecord(r *record) {
    d.mu.Lock()
    defer d.mu.Unlock()

    // store record using a key
    d.keyRecords[r.check_key] = append(d.keyRecords[r.check_key], r)

    // store record using a id
    d.idRecords[r.id] = append(d.idRecords[r.id], r)
    if d.idRecords[r.id] == nil {
        d.keyIdRecords[r.id] = make(map[int][]*record)
    }

    // store record using a key and id
    d.keyIdRecords[r.id][r.check_key] = append(d.keyIdRecords[r.id][r.check_key], r)

    d.allIds[r.id] = struct{}{}
}

We are adding in this way, so that at the time of fetching these records from any any type of key, whether it be Key, or ID or Key and ID, we don't have to loop over records and check each records whether it contains that key or not.

Now coming to the concept of Mutex that we are using every time as below: Mutex: ensures that only one thread or one process or one goroutine can access the critical section of code or shared resource at any given time.

NOTE: right now it struck me at the time of writing, we can remove mutex because we are not using goroutines.

Here we are using ReadWrite Mutex type.

mu           sync.RWMutex

Each time when write operation is being done to db, then we want to make sure that no other process or threads or goroutine is trying to write. That's why we are locking the operation. And once written it is unlocked. And then next threads or goroutine waiting in a queue will perform the writing operation.

d.mu.Lock()
defer d.mu.Unlock()

and

Below Mutex is used in case when multiple process or threads or goroutine is trying to read the database, in that

d.mu.RLock()
defer d.mu.RUnlock()

Let me know if you have any question ?

riteshnoronha commented 1 month ago

We dont use go-routines currently. Lets keep it even simpler for this release.

viveksahu26 commented 1 month ago

Removed Mutex stuffs or any related to goroutines.