Closed Jake-Shadle closed 4 years ago
Yikes -- that's no good at all. Good digging. I'm going to:
Going to see how far I can get on all of this today. :)
Sounds good, thanks for the quick response, but no need to rush! :)
Oh gosh. I just realized that remove_common_tokens isn't even in 0.3.0, which explains why my cherry pick didn't resolve cleanly. But it probably snuck into the askalono.linux-static
build I manually added a few months later.
@Jake-Shadle were you using the static build off of GitHub releases, or was it via cargo install
? I think that might have been the issue with 0.3. This is fixed in master for the next release, however.
I was using cargo install.
cargo install from crates.io (cargo install askalono-cli
) or via this repository? I'm trying to track down where you experienced this bug so I can make sure it's eradicated everywhere.
Oh sorry, cargo install from crates.io, that's how I narrowed down the cause, by looking at the commits that had happened after the 0.3.0 version bump, as initially I had assumed some of the changes on my fork had been the cause, but when going back to an unmodified HEAD, it exhibited the same behavior.
Ok, I think what might have happened is you did cargo install
from crates.io originally, but at some point might have run it directly against this repository (which still has the 0.3.0 version number in source; I haven't yet bumped that). I just did a fresh cargo install --force askalono-cli
to get the latest version published to crates.io and it's giving me expected results:
❯❯❯ cargo install --force askalono-cli
Updating crates.io index
Installing askalono-cli v0.3.0
[...]
Compiling env_logger v0.5.13
Compiling askalono v0.3.0
Compiling ignore v0.4.7
Compiling askalono-cli v0.3.0
Finished release [optimized] target(s) in 4m 26s
Replacing /Users/peddicor/.cargo/bin/askalono
❯❯❯ which askalono
/Users/peddicor/.cargo/bin/askalono
❯❯❯ askalono id ~/Desktop/testlicense
License: MIT (original text)
Score: 0.994
So there is definitely a bug in master
(that's currently being worked around by disabling that text preprocessor) but 0.3.0 should be fine, in both library and executable form as published to crates.io.
If I've missed something, please let me know and try to get a reproducible test case and I'll dig into this more for that version.
I am using askalono as a library and was trying to figure out why I was getting extremely low confidence scores vs the 0.3.0 CLI that I installed, which was correctly identifying one of the problematic license files, eg. https://github.com/rust-random/rand/blob/master/rand_core/LICENSE-MIT
Here's an example run where I just print the result of each preprocess step,
remove_common_tokens
runs first, and basically truncates all of relevant license text which results in the analysis being unable to do much of anything.This license is a bit odd with the 2 copyright headers at the beginning, and indeed, removing one of them won't trigger the truncation any longer.