Open workingjubilee opened 9 months ago
I can't find an option to enable "massage newline differences like this one" in the library API, and I think that doing so might be worth it as an option on top of the whole "the return value is a ratio reflecting the scoring of it as a match" bit.
That said, the original issue seems to be a problem in the underlying data used: SPDX has subtle differences between the HTML and JSON renderings in terms of how it emits spaces.
Deeply open-ended question, but the following file is a direct copy of https://spdx.org/licenses/AGPL-1.0.html "by hand" (right-click, copy, paste), but
askalono id
only scores 0.999 instead of the 1.0 that printing the extract from the JSON gets you: LICENSE-RIGHTCLICK.txtIt's not clear to me which is the canonical version and thus which is (arguably) a license violation. It's also not clear to me that askalono should fudge the line breaks here. It's also not clear to me that askalono should NOT fudge the line breaks here.