package-url / packageurl-python

Python implementation of the package url spec. This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ , the Google Summer of Code, nexB and other generous sponsors.
68 stars 42 forks source link

Add support for additional packages in purl2url #143

Open johnmhoran opened 8 months ago

johnmhoran commented 8 months ago

This is related to the PURL CLI tool/library described in https://github.com/nexB/purldb/issues/247.

johnmhoran commented 6 months ago

@TG1999 @keshav-space Yesterday I installed requests in my local repo fork of packageurl-python so I could explore getting download_url data from the pypi API ( and I am able to do that now). If I run pip list in the command line for that local repo, I get

$ pip list
Package            Version
------------------ --------
certifi            2024.2.2
charset-normalizer 3.3.2
idna               3.6
pip                24.0
requests           2.31.0
setuptools         69.1.0
urllib3            2.2.1
wheel              0.42.0

(venv) Mon Mar 11, 2024 10:30 AM  /home/jmh/dev/nexb/packageurl-python jmh (143-add-purl2url-package-support)
$

However, when I run bin/py.test tests/contrib/test_purl2url.py -vvs I get the error ERROR tests/contrib/test_purl2url.py - ModuleNotFoundError: No module named 'requests'.

I am exploring the purl2url work from my local "sandbox" -- simply another repo from inside of which I've run

pip install -e /home/jmh/dev/nexb/packageurl-python 

so I can access my changes in purl2url.py from that sandbox. However, inside my forked packageurl-python repo, there is no requirements.txt, and its setup.cfg contains

[options]
python_requires = >=3.7
packages = find:
package_dir = =src
include_package_data = true
zip_safe = false
install_requires =

but nothing listed under install_requires.

I think I need somehow to rerun make dev in this local fork, perhaps preceded by adding the requests library to the setup.cfg or creating a requirements.txt containing requests -- but I'm a bit reluctant to do so without confirming with you, concerned that I might mess up my local packageurl-python fork. Do you have any suggestions?

johnmhoran commented 6 months ago

In the packageurl-python fork setup.cfg I added:

install_requires =
    requests == 2.31.0

and in /home/jmh/dev/nexb/packageurl-python I ran pip install -e ., but when I reran bin/py.test tests/contrib/test_purl2url.py -vvs I again got ERROR tests/contrib/test_purl2url.py - ModuleNotFoundError: No module named 'requests'.

johnmhoran commented 6 months ago

This suggests to me that requests has been installed (and BTW so does my testing yesterday from my sandbox repo of this same packageurl-python repo/code):

(venv) Mon Mar 11, 2024 12:15 PM  /home/jmh/dev/nexb/packageurl-python jmh (143-add-purl2url-package-support)
$ python
Python 3.8.10 (default, Nov 22 2023, 10:22:35)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> requests.get("https://pypi.org/pypi/fetchcode")
<Response [200]>
>>> requests.get("https://pypi.org/pypi/fetchcode/json")
<Response [200]>
>>> exit()

(venv) Mon Mar 11, 2024 12:15 PM  /home/jmh/dev/nexb/packageurl-python jmh (143-add-purl2url-package-support)
$
johnmhoran commented 6 months ago

Running pip install -e . after installing requests and updating setup.cfg did not fix the pytest no-module-found errors for requests in my forked packageurl-python repo -- but make clean followed by make dev did. Now there are a few failing tests, but that's OK. I do wonder why pip install -e . was not sufficient to fix the pytest no-module-found error....

For the record, this was the full error from pytest:

(venv) Mon Mar 11, 2024 01:19 PM  /home/jmh/dev/nexb/packageurl-python jmh (143-add-purl2url-package-support)
$ bin/py.test tests/contrib/test_purl2url.py -vvs
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.8.10, pytest-7.4.4, pluggy-1.4.0 -- /home/jmh/dev/nexb/packageurl-python/bin/python
cachedir: .pytest_cache
rootdir: /home/jmh/dev/nexb/packageurl-python
configfile: setup.cfg
collected 0 items / 2 errors

==================================================================================================== ERRORS =====================================================================================================
________________________________________________________________________________ ERROR collecting tests/contrib/test_purl2url.py ________________________________________________________________________________
tests/contrib/test_purl2url.py:29: in <module>
    from packageurl.contrib import purl2url
lib/python3.8/site-packages/_pytest/assertion/rewrite.py:186: in exec_module
    exec(co, module.__dict__)
src/packageurl/contrib/purl2url.py:27: in <module>
    import requests
E   ModuleNotFoundError: No module named 'requests'
________________________________________________________________________________ ERROR collecting tests/contrib/test_purl2url.py ________________________________________________________________________________
ImportError while importing test module '/home/jmh/dev/nexb/packageurl-python/tests/contrib/test_purl2url.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
lib/python3.8/site-packages/_pytest/python.py:617: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
lib/python3.8/site-packages/_pytest/pathlib.py:567: in import_path
    importlib.import_module(module_name)
/usr/lib/python3.8/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1014: in _gcd_import
    ???
<frozen importlib._bootstrap>:991: in _find_and_load
    ???
<frozen importlib._bootstrap>:975: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:671: in _load_unlocked
    ???
lib/python3.8/site-packages/_pytest/assertion/rewrite.py:186: in exec_module
    exec(co, module.__dict__)
tests/contrib/test_purl2url.py:29: in <module>
    from packageurl.contrib import purl2url
lib/python3.8/site-packages/_pytest/assertion/rewrite.py:186: in exec_module
    exec(co, module.__dict__)
src/packageurl/contrib/purl2url.py:27: in <module>
    import requests
E   ModuleNotFoundError: No module named 'requests'
============================================================================================ short test summary info ============================================================================================
ERROR tests/contrib/test_purl2url.py - ModuleNotFoundError: No module named 'requests'
ERROR tests/contrib/test_purl2url.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================================================================== 2 errors in 0.22s ===============================================================================================

(venv) Mon Mar 11, 2024 01:19 PM  /home/jmh/dev/nexb/packageurl-python jmh (143-add-purl2url-package-support)
$
JonoYang commented 6 months ago

@johnmhoran Looking at your terminal prompt, this is what I think may be happening:

(venv) Mon Mar 11, 2024 01:19 PM  /home/jmh/dev/nexb/packageurl-python jmh (143-add-purl2url-package-support)
$ bin/py.test tests/contrib/test_purl2url.py -vvs

I think you have installed packageurl-python and requests to the venv virtual environment, but you are running bin/py.test from the packageurl-python directory, where bin/py.test is handled by a different virtual environment than the one you're in. Try running py.test tests/contrib/test_purl2url.py -vvs to use the py.test handled by venv.

johnmhoran commented 6 months ago

@JonoYang Running py.test tests/contrib/test_purl2url.py -vvs threw an error

ImportError: No module named packageurl.contrib

But I think my running make clean then make dev, as I noted above, was the fix to the requests-related ModuleNotFoundError. Running pip install -e . was not enough to get requests loaded, evidently.

The 2 failing tests I now get are OK -- that's because I added the ability to actually get the pypi download_url for tar.gz downloads. However I have several questions about what our goals are in these tests. One failing test looks to get a pypi .whl as a download -- it seems that's just to test in case pypi is not yet supported for downloads (as has been the case until now).

Do you have time to discuss?

johnmhoran commented 6 months ago

@TG1999 @keshav-space @tdruez I can now get a download_url for pypi PURLs (though the code is not quite ready for prime time). Looking at the pypi JSON structure/content I get from requests.get() and at our current tests , if the few JSON examples I've seen are representative, we can retrieve either a .whl (using "packagetype": "bdist_wheel"`) or a.tar.gz(using"packagetype": "sdist"`). I have drafted the pypi download_url function for now to

I see a variety of test PURL inputs and expected outputs in our tests but our actual goals for the purl2url.py output are not 100% clear. Is the approach I described above what we want? If not, please let me know what changes you want me to make in the data we retrieve. (At the risk of creating clutter, I'll paste sample output in the next comment below so you have the actual output data to examine.)

johnmhoran commented 6 months ago

Rather than post the verbose output here I pasted to a .txt I'll upload....

packageurl-python-purl2url-pypi-sample-output-2024-03-11.txt

johnmhoran commented 6 months ago

@TG1999 Further to your (and other) comments in the recently-closed prior PR 151, I've removed most of my prior code, and this issue -- and the new PR I'll open shortly -- now focus on adding repo URL support and testing for cocoapods (pypi support is already there and fine) and additional pypi testing.

I'll turn next to fetchcode/package.py to add download URL (and other) support for cocoapods and pypi.

johnmhoran commented 6 months ago

@TG1999 Actually, I'd forgotten that fetchcode/package.py already handles pypi, including providing a single download URL entry (just one, as is the case for the other supported types as well, although there are often additional download files available).

I have a few questions for you and @pombredanne about the details (e.g., do we want to add the ability for additional download files as a list or otherwise) and will ask them in the related fetchcode issue I opened recently.

Re that question about multiple download files, I also raised it earlier in this issue (see this comment) -- this question is still a live question for you and @pombredanne -- I understand that I cannot simply modify the current inferred URLs function because people rely on its current form -- do we want to add this capability and, if so, how? We might want the download URL value to be a list rather than a single URL, and we might want the inferred URLs list to include more than the current repo and download URL values, but all of that would most naturally involve modifying the existing functions, which we don't want to do.

Please let me know what you think.