scanoss / purl2cpe

PURL to CPE Relationship mapping project.
MIT License
69 stars 19 forks source link

Official purl2cpe release on pypi? #10

Closed ffontaine closed 1 year ago

ffontaine commented 1 year ago

Hello,

I find out your project by googling and it seems great. I would like to use it in other open sources such as cve-bin-tool. However, it seems that there is no official release of your project on pypi. So I'm wondering if you plan to make one some day? If not, what is the best way to integrate your project? Should I make a github submodule or perhaps just build and update purl2cpe.db regularly?

Best Regards and thanks for your work

scanossmining commented 1 year ago

Hi @ffontaine , thank you for the feedback! We want to help you, so can you please share a bit more what the use case is for you and how will use the data, in order for us to fully understand how's the best way to handle this? Would it help you if we add also a purl2cpe.db sqlite database in here, created based on the data folder? Or do you need the sqlite_loader.py script or something similar that just queries a given purl2cpe DB, to be published on pypi?

ffontaine commented 1 year ago

Basically, having purl2cpe.db in the official pypi release is the only thing needed. My idea is to open purl2cpe.db to retrieve purl entries from a given CPE ID. You can find here a first iteration of what I plan to do here: https://github.com/anthonyharrison/lib4sbom/pull/16. This is still a work in progress and @anthonyharrison will perhaps have additional comments.

scanossmining commented 1 year ago

Thanks for the details. So you basically want an easy way to query the data, right? It wouldn't be difficult for us to put the repo on pypi, however the main purpose of it is to provide just the dataset, not to act as an actual Python package. Also, the dataset is automatically updated on a daily basis, so adding it to pypi would lead to daily releases of that project and we would like to avoid that. If we automatically also add and update the purl2cpe.db file to this repo and then you download it whenever you need it using the direct download link from the main branch and build your own queries, would that work for you?

anthonyharrison commented 1 year ago

Having a pre-populated database on github would work so this can be downloaded and integrated with another application. Whilst I understand that the database may not be up to date, having a local copy of the data supports some critical use cases which I am trying to meet.

Whilst updates are being performed on a daily basis, do you provide a log of the changes which are being implemented?

scanossmining commented 1 year ago

The change logs we have right now are in the commit history. If you clone the repo locally, you can check the git log and then parse the output, in case you want to look for dates of changes on a specific purl/cpe. We will start adding the prebuilt purl2cpe.db file, but in a .zip format, in order to save space. You will have to download and decompress before using it. Is this OK for your use case? Please also keep in mind that you can always clone and pull the repo and then build the database yourself using the sqlite_loader.py script.

scanossmining commented 1 year ago

@ffontaine @anthonyharrison the purl2cpe.db.zip file is now available in the root dir of this repository, you can start using its direct download link to get it. This file will be automatically updated with the same changes that will go inside the data dir.