tern-tools / tern

Tern is a software composition analysis tool and Python library that generates a Software Bill of Materials for container images and Dockerfiles. The SBOM that Tern generates will give you a layer-by-layer view of what's inside your container in a variety of formats including human-readable, JSON, HTML, SPDX and more.
BSD 2-Clause "Simplified" License
948 stars 187 forks source link

Proposal: Create a database backend with an associated API #50

Open jeevers opened 6 years ago

jeevers commented 6 years ago

It could be useful to have a database backend so that data can be more easily organized and queried. I think SQLite would be a good fit (at least at first) due to its ease of setup and management via the sqlite3 module in the standard library. Eventually we can add support for other databases.

nishakm commented 4 years ago

@PrajwalM2212 recommended sqlite as well: I think we can just choose sqlite3 because 1. It is faster 2. It is good for applications where code that executes sql statements and the application reside on the same machine. 3. It also supports huge amount of data upto 140TB with greater performance 4. It is provided as part of python standard lib https://www.sqlite.org/whentouse.html

zoek1 commented 4 years ago

The main requirement is that the storage be self contained ,right? that's why redis is not an option? @nishakm

PrajwalM2212 commented 4 years ago

@zoek1 That was one of the reasons why I suggested sqlite. Since we are only using the cache for analysis purpose ( our internal use ) , sqlite gives the best value.

nishakm commented 4 years ago

At this time, my main concern is to move away from storing data in a YAML file and into something that is queryable. The discussion I would really like to have is whether we should be using a key-value store (like Redis) or a relational database (like sqlite). One thing about choosing a relational database is that you will need to put time into designing the database. Once done, it is difficult to undo. Key-value stores are easier to change, but suffer from the same problems as the flat YAML file which is that as more data gets added, it becomes less queryable. I am personally leaning towards implementing this in sqlite because we already have a data model and making an API for queries means the database can be switched with something else.

nishakm commented 4 years ago

My research shows that using a json file as a backend greatly improves performance:

yaml backend: 76 seconds json backend: 0.47 seconds

We would still like a database backend so folks can set up a centralized repository which is queryable but for now, replacing the caching format from json to yaml is an easy improvement.

nishakm commented 3 years ago
  1. Design CRUD API for different items in the database #792
  2. Implement the database #863
  3. Implement the sync mechanism #862
ashok-arora commented 2 years ago

What's the status of this proposal and can I work on it?

urmilkalaria commented 2 years ago

I don't know if it is possible but since we are aiming to store the container image into database, can't we convert docker image to JSON format and then store in JSON data in redis database. Since JSON greatly increase the performance and also accessing database through Redis is faster.