microsoft / component-detection

Scans your project to determine what components you use
MIT License
440 stars 91 forks source link

Python requirements.txt parsing shows errors in SBOM-Tool #590

Closed anotherbridge closed 1 year ago

anotherbridge commented 1 year ago

Description

When running the SBOM-tool on a project that is Python based and has requirements that either have square brackets or minimum/maximum version numbers given, receiving the data comes with the following warning:

[WARN] Received 404 Not Found from https://pypi.org/pypi/<package>/json

Since SBOM-tool is based uses component-detection, I wanted to repost the corresponding issue here.

To reproduce

Examples of requirements to reproduce the errors:

beautifulsoup4>=4.9.1
molecule[podman]

Expected behavior

Receiving the data from PyPi and processing it.

Actual Behavior

Got the following result:

[INFO] Getting Python data from https://pypi.org/pypi/beautifulsoup4>=4.9.1/json 
[WARN] Received 404 Not Found from https://pypi.org/pypi/beautifulsoup4>=4.9.1/json 
[WARN] Dependency Package beautifulsoup4>=4.9.1 not found in Pypi. Skipping package
[INFO] Getting Python data from https://pypi.org/pypi/molecule[podman]/json 
[WARN] Received 404 Not Found from https://pypi.org/pypi/molecule[podman]/json 
[WARN] Root dependency molecule[podman] not found on pypi. Skipping package.

Suggestions I am not exactly sure which information you are retrieving from the API, but if it's the latest package version (which in case of the great-than comparator is the default version to be installed), you could just strip away the version number and comparator, that is e.g. query this URL https://pypi.org/pypi/beautifulsoup4/json for example.

In case of having brackets one could check the base package API and then in the JSON response check the additional requirements that it depends on with this extra depencies under .info.requires_dist. Yet this won't work the above example in which it is possible to query for https://pypi.org/pypi/molecule-podman/json

melotic commented 1 year ago

Tested your provided requirements.txt file. This works properly in the current version of CD. SBOM-Tool is using an outdated version of CD.

CD Logs ``` [13:46:06 INF] Getting Python data from https://pypi.org/pypi/beautifulsoup4/json [13:46:07 INF] Getting Python data from https://pypi.org/pypi/molecule/json [13:46:08 INF] Getting Python data from https://files.pythonhosted.org/packages/57/f4/a69c20ee4f660081a7dedb1ac57f29be9378e04edfcb90c526b923d4bebc/beautifulsoup4-4.12.2-py3-none-any.whl [13:46:08 INF] Getting Python data from https://pypi.org/pypi/soupsieve/json [13:46:08 INF] Getting Python data from https://files.pythonhosted.org/packages/2f/d0/35a704e135f6da7f824894bc08005acb6846fc1705d7870368ddd51be475/molecule-5.0.1-py3-none-any.whl [13:46:08 INF] Getting Python data from https://pypi.org/pypi/ansible-compat/json [13:46:09 INF] Getting Python data from https://pypi.org/pypi/ansible-core/json [13:46:09 INF] Getting Python data from https://pypi.org/pypi/click/json [13:46:09 INF] Getting Python data from https://pypi.org/pypi/click-help-colors/json [13:46:09 INF] Getting Python data from https://pypi.org/pypi/cookiecutter/json [13:46:10 INF] Getting Python data from https://pypi.org/pypi/enrich/json [13:46:10 INF] Getting Python data from https://pypi.org/pypi/jsonschema/json [13:46:11 INF] Getting Python data from https://pypi.org/pypi/Jinja2/json [13:46:11 INF] Getting Python data from https://pypi.org/pypi/packaging/json [13:46:11 INF] Getting Python data from https://pypi.org/pypi/pluggy/json [13:46:11 INF] Getting Python data from https://pypi.org/pypi/PyYAML/json [13:46:12 INF] Getting Python data from https://pypi.org/pypi/rich/json [13:46:14 INF] Getting Python data from https://files.pythonhosted.org/packages/49/37/673d6490efc51ec46d198c75903d99de59baffdd47aea3d071b80a9e4e89/soupsieve-2.4.1-py3-none-any.whl [13:46:14 INF] Getting Python data from https://files.pythonhosted.org/packages/8d/0c/b76d53cdd2fb75099705bd5b7ea7ce2dbab5de0e8d7bb59649738ac865c4/ansible_compat-3.0.2-py3-none-any.whl [13:46:14 INF] Getting Python data from https://pypi.org/pypi/subprocess-tee/json [13:46:14 INF] Getting Python data from https://files.pythonhosted.org/packages/3b/48/9612767699d2e45efadb6c00dd61a02f2db0d8bd1642345bd0ee9472fe6a/ansible_core-2.15.0-py3-none-any.whl [13:46:14 INF] Getting Python data from https://pypi.org/pypi/cryptography/json [13:46:19 INF] Getting Python data from https://pypi.org/pypi/resolvelib/json [13:46:19 INF] Getting Python data from https://pypi.org/pypi/importlib-resources/json [13:46:19 INF] Getting Python data from https://files.pythonhosted.org/packages/c2/f1/df59e28c642d583f7dacffb1e0965d0e00b218e0186d7858ac5233dce840/click-8.1.3-py3-none-any.whl [13:46:20 INF] Getting Python data from https://pypi.org/pypi/colorama/json [13:46:20 INF] Getting Python data from https://pypi.org/pypi/importlib-metadata/json [13:46:20 INF] Getting Python data from https://files.pythonhosted.org/packages/cb/cb/607ad1fcfea897b4b0bd5642dd4bd158f66db1c83712f1afddf026f7de14/click_help_colors-0.9.1-py3-none-any.whl [13:46:20 INF] Getting Python data from https://files.pythonhosted.org/packages/64/4f/66a92457a729104db896321135e05b0cf94a9034fd5345f30d4d8386b957/cookiecutter-2.1.1-py2.py3-none-any.whl [13:46:20 INF] Getting Python data from https://pypi.org/pypi/binaryornot/json [13:46:20 INF] Getting Python data from https://pypi.org/pypi/jinja2-time/json [13:46:20 INF] Getting Python data from https://pypi.org/pypi/python-slugify/json [13:46:20 INF] Getting Python data from https://pypi.org/pypi/requests/json [13:46:21 INF] Getting Python data from https://files.pythonhosted.org/packages/76/67/aecd1d435dbbdcea21a197d708e9ff0bcc7306c2847c6c87cc1a91e2cca4/enrich-1.2.7-py3-none-any.whl [13:46:21 INF] Getting Python data from https://files.pythonhosted.org/packages/c1/97/c698bd9350f307daad79dd740806e1a59becd693bd11443a0f531e3229b3/jsonschema-4.17.3-py3-none-any.whl [13:46:21 INF] Getting Python data from https://pypi.org/pypi/attrs/json [13:46:21 INF] Getting Python data from https://pypi.org/pypi/pkgutil-resolve-name/json [13:46:21 INF] Getting Python data from https://pypi.org/pypi/pyrsistent/json [13:46:21 INF] Getting Python data from https://pypi.org/pypi/typing-extensions/json [13:46:21 INF] Getting Python data from https://files.pythonhosted.org/packages/bc/c3/f068337a370801f372f2f8f6bad74a5c140f6fda3d9de154052708dd3c65/Jinja2-3.1.2-py3-none-any.whl [13:46:21 INF] Getting Python data from https://pypi.org/pypi/MarkupSafe/json [13:46:22 INF] Getting Python data from https://files.pythonhosted.org/packages/ab/c3/57f0601a2d4fe15de7a553c00adbc901425661bf048f2a22dfc500caf121/packaging-23.1-py3-none-any.whl [13:46:22 INF] Getting Python data from https://files.pythonhosted.org/packages/9e/01/f38e2ff29715251cf25532b9082a1589ab7e4f571ced434f98d0139336dc/pluggy-1.0.0-py2.py3-none-any.whl [13:46:22 INF] Getting Python data from https://files.pythonhosted.org/packages/44/e5/4fea13230bcebf24b28c0efd774a2dd65a0937a2d39e94a4503438b078ed/PyYAML-6.0-cp310-cp310-macosx_10_9_x86_64.whl [13:46:22 INF] Getting Python data from https://files.pythonhosted.org/packages/ea/93/c68645c689d10a035010e3ae314b6b2855d040ce0d11fdfdfbb8be416581/rich-13.4.1-py3-none-any.whl [13:46:22 INF] Getting Python data from https://pypi.org/pypi/markdown-it-py/json [13:46:22 INF] Getting Python data from https://pypi.org/pypi/pygments/json [13:46:22 INF] Getting Python data from https://files.pythonhosted.org/packages/df/05/2ab0eca1a36a1a3d14657ccd069660636de53d7ce58050568f3bcd2e0f12/subprocess_tee-0.4.1-py3-none-any.whl [13:46:22 INF] Getting Python data from https://files.pythonhosted.org/packages/d8/80/e32f30266381f6ca05ee4aa92ce5f305aa1acbef4117a9a8d94d9b60bb67/cryptography-41.0.1-cp37-abi3-macosx_10_12_universal2.whl [13:46:22 INF] Getting Python data from https://pypi.org/pypi/cffi/json [13:46:22 INF] Getting Python data from https://files.pythonhosted.org/packages/d2/fc/e9ccf0521607bcd244aa0b3fbd574f71b65e9ce6a112c83af988bbbe2e23/resolvelib-1.0.1-py2.py3-none-any.whl [13:46:22 INF] Getting Python data from https://files.pythonhosted.org/packages/46/10/7cc167fe072037c3cd2a15a92bb963b86f2bab8ac0995fab95fb7a152b80/importlib_resources-5.0.7-py3-none-any.whl [13:46:22 INF] Getting Python data from https://pypi.org/pypi/zipp/json [13:46:22 INF] Getting Python data from https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl [13:46:22 INF] Getting Python data from https://files.pythonhosted.org/packages/30/bb/bf2944b8b88c65b797acc2c6a2cb0fb817f7364debf0675792e034013858/importlib_metadata-6.6.0-py3-none-any.whl [13:46:22 INF] Getting Python data from https://files.pythonhosted.org/packages/24/7e/f7b6f453e6481d1e233540262ccbfcf89adcd43606f44a028d7f5fae5eb2/binaryornot-0.4.4-py2.py3-none-any.whl [13:46:22 INF] Getting Python data from https://pypi.org/pypi/chardet/json [13:46:22 INF] Getting Python data from https://files.pythonhosted.org/packages/6a/a1/d44fa38306ffa34a7e1af09632b158e13ec89670ce491f8a15af3ebcb4e4/jinja2_time-0.2.0-py2.py3-none-any.whl [13:46:23 INF] Getting Python data from https://pypi.org/pypi/arrow/json [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/b4/85/6aa722a11307ec572682023b76cad4c52cda708dfc25fcb4b4a6051da7ab/python_slugify-8.0.1-py2.py3-none-any.whl [13:46:23 INF] Getting Python data from https://pypi.org/pypi/text-unidecode/json [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/70/8e/0e2d847013cb52cd35b38c009bb167a1a26b2ce6cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl [13:46:23 INF] Getting Python data from https://pypi.org/pypi/charset-normalizer/json [13:46:23 INF] Getting Python data from https://pypi.org/pypi/idna/json [13:46:23 INF] Getting Python data from https://pypi.org/pypi/urllib3/json [13:46:23 INF] Getting Python data from https://pypi.org/pypi/certifi/json [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/f0/eb/fcb708c7bf5056045e9e98f62b93bd7467eb718b0202e7698eb11d66416c/attrs-23.1.0-py3-none-any.whl [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/c9/5c/3d4882ba113fd55bdba9326c1e4c62a15e674a2501de4869e6bd6301f87e/pkgutil_resolve_name-1.3.10-py3-none-any.whl [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/ed/7b/7d032130a6838b179b46dff1ee88909c11d518a10ec9bc70c4b72c7c2f80/pyrsistent-0.19.3-cp310-cp310-macosx_10_9_universal2.whl [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/5f/86/d9b1518d8e75b346a33eb59fa31bdbbee11459a7e2cc5be502fa779e96c5/typing_extensions-4.6.3-py3-none-any.whl [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/20/1d/713d443799d935f4d26a4f1510c9e61b1d288592fb869845e5cc92a1e055/MarkupSafe-2.1.3-cp310-cp310-macosx_10_9_universal2.whl [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/bf/25/2d88e8feee8e055d015343f9b86e370a1ccbec546f2865c98397aaef24af/markdown_it_py-2.2.0-py3-none-any.whl [13:46:23 INF] Getting Python data from https://pypi.org/pypi/mdurl/json [13:46:23 INF] Getting Python data from https://pypi.org/pypi/typing_extensions/json [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/34/a7/37c8d68532ba71549db4212cb036dbd6161b40e463aba336770e80c72f84/Pygments-2.15.1-py3-none-any.whl [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/3f/fa/dfc242febbff049509e5a35a065bdc10f90d8c8585361c2c66b9c2f97a01/cffi-1.15.1-cp27-cp27m-macosx_10_9_x86_64.whl [13:46:23 INF] Getting Python data from https://pypi.org/pypi/pycparser/json [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/5b/fa/c9e82bbe1af6266adf08afb563905eb87cab83fde00a0a08963510621047/zipp-3.15.0-py3-none-any.whl [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/74/8f/8fc49109009e8d2169d94d72e6b1f4cd45c13d147ba7d6170fb41f22b08f/chardet-5.1.0-py3-none-any.whl [13:46:23 INF] Getting Python data from https://files.pythonhosted.org/packages/67/67/4bca5a595e2f89bff271724ddb1098e6c9e16f7f3d018d120255e3c30313/arrow-1.2.3-py3-none-any.whl [13:46:24 INF] Getting Python data from https://pypi.org/pypi/python-dateutil/json [13:46:24 INF] Getting Python data from https://files.pythonhosted.org/packages/a6/a5/c0b6468d3824fe3fde30dbb5e1f687b291608f9473681bbf7dabbf5a87d7/text_unidecode-1.3-py2.py3-none-any.whl [13:46:24 INF] Getting Python data from https://files.pythonhosted.org/packages/4f/a2/9031ba4a008e11a21d7b7aa41751290d2f2035a2f14ecb6e589771a17c47/charset_normalizer-3.1.0-cp310-cp310-macosx_10_9_universal2.whl [13:46:24 INF] Getting Python data from https://files.pythonhosted.org/packages/fc/34/3030de6f1370931b9dbb4dad48f6ab1015ab1d32447850b9fc94e60097be/idna-3.4-py3-none-any.whl [13:46:24 INF] Getting Python data from https://files.pythonhosted.org/packages/4b/1d/f8383ef593114755429c307449e7717b87044b3bcd5f7860b89b1f759e34/urllib3-2.0.2-py3-none-any.whl [13:46:24 INF] Getting Python data from https://files.pythonhosted.org/packages/9d/19/59961b522e6757f0c9097e4493fa906031b95b3ebe9360b2c3083561a6b4/certifi-2023.5.7-py3-none-any.whl [13:46:24 INF] Getting Python data from https://files.pythonhosted.org/packages/b3/38/89ba8ad64ae25be8de66a6d463314cf1eb366222074cfda9ee839c56a4b4/mdurl-0.1.2-py3-none-any.whl [13:46:24 DBG] Retrieved cached Python data from https://files.pythonhosted.org/packages/5f/86/d9b1518d8e75b346a33eb59fa31bdbbee11459a7e2cc5be502fa779e96c5/typing_extensions-4.6.3-py3-none-any.whl [13:46:24 INF] Getting Python data from https://files.pythonhosted.org/packages/62/d5/5f610ebe421e85889f2e55e33b7f9a6795bd982198517d912eb1c76e1a53/pycparser-2.21-py2.py3-none-any.whl [13:46:24 INF] Getting Python data from https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl [13:46:24 INF] Getting Python data from https://pypi.org/pypi/six/json [13:46:24 INF] Getting Python data from https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl ```