Closed jpeddicord closed 6 years ago
Would you be open to (possibly mentoring) an external contribution for this task?
@Phrohdoh Sure! Give me a little bit to flesh out this issue description (a lot of these issues I initially filed as "notes to self" and their descriptions kinda suck) -- I have a scrappy collection of license files I was originally testing with locally and have a rough idea of a way this could proceed.
@Phrohdoh I clarified the description a bit. If you're still interested in this, I'd recommend playing around with Store
as a starting point: have it load a cache file (or load from the SPDX directory; see cli/build.rs
for an example of that) and get it to identify a license. examples/basic.rs
has some of that as well. The documentation sucks (sorry) but the types involved should be relatively easy to figure out.
I opened up a Gitter room at https://gitter.im/amzn/askalono if you want to ping me for help; you may need to @ me as I'm not entirely sure I've configured notifications correctly 🙃
Great, thanks!
I will take a stab at this over the upcoming weekend.
@Phrohdoh no rush, but let me know if you'd like any extra help here. If you want, feel free to PR the code you showed on Gitter and we can iterate from there.
I've set up the remainder of this infrastructure:
https://github.com/amzn/askalono/blob/master/tests/real_world.rs
This builds on @Phrohdoh's initial work and crawls a directory, parsing out expected license names and thresholds to test licenses in a flexible manner. While I definitely want to add more licenses to this test dataset, I think we can consider this issue resolved. :)
I'd like to set up some tests that verify that askalono doesn't regress on its ability to identify licenses. Basically, a directory of license files named/sorted according to their actual license. askalono runs on them, identifies them, and verifies it got the answer right.
Possible implementation
Another layout for this is perfectly acceptable; this could also be set up with some metadata files describing what a file should be identified as, with what confidence, etc. That may be overkill for the time being.