spdx / spdx-license-matcher

A tool to match license text with SPDX license list using a an algorithm with finds close matches. It follows SPDX Matching guidelines to keep the substantial text as well as ignore the replaceable text for matching purposes.
Other
27 stars 14 forks source link

Support a new schema of spdx.org/licenses #14

Closed m1kit closed 3 years ago

m1kit commented 3 years ago

It seems https://spdx.org/licenses/licenses.json returns relative urls and the field for json is reference, not detailsUrl.

This is an error I got, and this PR is a fix for the error.

Building SPDX License List. This may take a while...
Traceback (most recent call last):
  File "/Users/mikit/.pyenv/versions/3.7.6/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/mikit/.pyenv/versions/3.7.6/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx_license_matcher/matcher.py", line 59, in <module>
    matcher()
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx-matcher/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx-matcher/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx-matcher/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx-matcher/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx_license_matcher/matcher.py", line 32, in matcher
    build_spdx_licenses()
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx_license_matcher/build_licenses.py", line 38, in build_spdx_licenses
    responses = list(pool.map(get_url, licensesUrl))
  File "/Users/mikit/.pyenv/versions/3.7.6/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
    yield fs.pop().result()
  File "/Users/mikit/.pyenv/versions/3.7.6/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/Users/mikit/.pyenv/versions/3.7.6/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/Users/mikit/.pyenv/versions/3.7.6/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx_license_matcher/build_licenses.py", line 20, in get_url
    res = requests.get(url, headers=headers)
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx-matcher/lib/python3.7/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx-matcher/lib/python3.7/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx-matcher/lib/python3.7/site-packages/requests/sessions.py", line 516, in request
    prep = self.prepare_request(req)
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx-matcher/lib/python3.7/site-packages/requests/sessions.py", line 459, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx-matcher/lib/python3.7/site-packages/requests/models.py", line 314, in prepare
    self.prepare_url(url, params)
  File "/Users/mikit/Downloads/spdx-license-matcher/spdx-matcher/lib/python3.7/site-packages/requests/models.py", line 388, in prepare_url
    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL './AFL-2.0.html': No schema supplied. Perhaps you meant http://./AFL-2.0.html?
rtgdk commented 3 years ago

@m1kit Thanks a lot for this fix! PR looks good to me. I'll wait for some comments from other maintainers.

@Ugtan How did detailsUrl used to work beforebecause if it contains the html url, response won't be converted to json. Aslo, did it used to contain the relative url or absolute?

@goneall Have we changed license list schema recently?

goneall commented 3 years ago

@goneall Have we changed license list schema recently?

There was a significant change to the tool that generates the license list data - that may have accidentally changed the format.

m1kit commented 3 years ago

There was a significant change to the tool that generates the license list data - that may have accidentally changed the format.

@goneall Could you tell me which commit includes that change? I'm unfamiliar with this organization but interested...

goneall commented 3 years ago

Could you tell me which commit includes that change?

@m1kit The LicenseListPublisher is the tool that takes the LicenseList-XML files as input and produces the License-List-Data output. Version 2.2.0 uses a completely re-written library which may have impacted the schema.

m1kit commented 3 years ago

I read some code in spdx/tools, spdx/Spdx-Java-Library, and spdx/LicenseListPublisher .

In the old library,

In the new library,

@goneall If these changes are unintentional, I will close this PR unmerged and work on a new PR to spdx/Spdx-Java-Library.

goneall commented 3 years ago

@m1kit Thank you very much for the analysis.

These changes were unintentional, so please go ahead and update the Java library.

m1kit commented 3 years ago

I already wrote a fix in the library, now I am updating related tests. I think I can create a new PR in a day👍