jpeddicord / askalono

A tool & library to detect open source licenses from texts
Apache License 2.0
255 stars 25 forks source link

Have a separate repo with just the cache? #71

Open hoijui opened 2 years ago

hoijui commented 2 years ago

How I got here

(You may safely skip this, I just include it because I imagine, many people will come here this way, and it is a way of confusion, which could possibly be improved)

I was looking for something exactly what askalando does (THANK YOU! already, here!). Trying to get started, I looked at at the example, and read the thing about this taking 20s to run. By coincidence, I also found the onefetch at almost the same time as your library, as it can do the same thing .. and it turns out it uses your library. Looking at its code for inspiration, I see that it loads data from a file called cache.bin.zstd. Looking at the version history, I see that they just commit this file as-is into their repo, with no info where it comes from. Which brings me to the question:

Where is cache.bin.zstd/How do I generate it?

I imagine, that it makes most sense to put only this file int a separate repo. That repo could then be used as a git submodule by anyone using your library, and subsequently they could include the while cache into their binary, if they which, or ship it alongside it. I imagine that this could be done with a scheduled CI script (e.g. running once a week), that fetches the SPDX repo, and if there was a new release (or releases), rebuild the cache and commit it to the cache-only-repo. I did something very similar, also based on the SPDX licenses repo as the source, here: https://github.com/hoijui/SPDX-identifiers-generator

amznpurple commented 1 year ago

I don't think it's being used. See here: https://github.com/jpeddicord/askalono/issues/89

hoijui commented 2 months ago

I came back to exactly this request again. it is still valid! In the meantime, I resorted to downloading the onefetch licenses cache (askalano format) from: https://github.com/o2sh/onefetch/blob/main/resources/license.cache.zstd

in my build.rs. While that works, it requries internet access during the build process, which is not allowed in some scenarios (for example when docs.rs builds the API docs for a crate), and it is also wasteful of bandwidth, downloading the file potentially many times for different builds (debug, release, ...). Having it in the repo as a git sub-module would be the optimal solution, still, and most people would be able to find it, if it was supplied by askalano.