nexB / purldb

Tools to create and expose a database of purls (Package URLs). This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss
https://purldb.readthedocs.io/
29 stars 21 forks source link

Update purlcli documentation, data structure and logging, add cocoapods support #436

Open johnmhoran opened 1 month ago

johnmhoran commented 1 month ago

Reference: https://github.com/nexB/purldb/issues/365

johnmhoran commented 1 month ago

@pombredanne This set of changes includes a portion of your detailed feedback from last week -- will address your remaining comments next.

@TG1999 @keshav-space I will also address your remaining comments on the cocoapods support I've added to packageurl-python/purl2url.py and fetchcode/package.py once I've finished updating the purlclci.py and related code and tests.

johnmhoran commented 1 month ago

@pombredanne @AyanSinhaMahapatra I've just committed and pushed my latest PURLCLI-related changes, including adding details to the RTD re the use of the four current PURLCLI tools. I stuck with the existing RTD structure and thus made all my RTD changes in /purldb/purldb-toolkit/README.rst.

This latest push also includes numerous detailed changes to purlcli.py and related test code, including what I think are all of Philippe's recent detailed comments (thanks again for these). One point re the structure and content of the output for the urls command (like the other three commands, the output is illustrated in some detail in my updated RTD entry):

I look forward to your comments.

johnmhoran commented 1 month ago

For easy reference the RTD update is here (although the new TOC doesn't show up in this simple .rst file).

https://github.com/nexB/purldb/blob/f49eff87ae9a4226584096498836de7d5015f6a2/purldb-toolkit/README.rst

johnmhoran commented 1 month ago

@pombredanne Now that I'm removing all logging/messaging, how do you want to handle a command run on a PURL whose type is not supported? Supported types are different for each command (some overlap some not) and changes over time as we add more support. I used to provide a warning like "'pkg:nginx/nginx@0.8.9' not supported with 'metadata' command",, but this now shows up as

TypeError: 'NoneType' object is not iterable

with all of the details included in the console. I can avoid this by first testing whether the particular query returns a value, e.g., for metadata that could be if info(purl):, and that simply returns 0 packages for that PURL.

johnmhoran commented 1 month ago

@pombredanne Given our recent discussions as well as the feedback you just provided re the fetchcode cocoapods work (thank you!), I think the answer to my question is: let the exception be raised and displayed in the console -- do not use if info(purl): to avoid raising such an exception. Put another way, if the queried PURL type is not supported, let the existing underlying code raise an exception if that's how it's been written. Correct? ;-)

pombredanne commented 1 month ago

@pombredanne Given our recent discussions as well as the feedback you just provided re the fetchcode cocoapods work (thank you!), I think the answer to my question is: let the exception be raised and displayed in the console -- do not use if info(purl): to avoid raising such an exception. Put another way, if the queried PURL type is not supported, let the existing underlying code raise an exception if that's how it's been written. Correct? ;-)

Not exactly:

later in a call:

raise UnsupportedPackageType(purl_type


- In the CLI, which is an applicationa nd not a library, we could catch these exceptions (possibly design a better set of Exception subclasses), and raise proper error messages and other constructs for proper display and feedback to the user in the CLI "UI"
johnmhoran commented 1 month ago

Thanks @pombredanne . Adding a new exception is an interesting option. For testing, in purlcli.py, I've added

class UnsupportedPackageType(Exception):
    pass

and for the metadata command, inside collect_metadata(), I've revised to read:

    collected_metadata = []
    try:
        for release in list(info(purl)):
            if release is None:
                continue
            release_detail = release.to_dict()
            release_detail.move_to_end("purl", last=False)
            collected_metadata.append(release_detail)
    except:
        raise UnsupportedPackageType(purl)
    return collected_metadata

metadata calls the info() function in fetchcode/package.py, which returns None when the NoRouteAvailable exception is triggered -- I interpret this as meaning NoRouteAvailable == unsupported PURL type. NoRouteAvailable is also used in package_versions.py, which is called by our new versions command. Perhaps NoRouteAvailable could instead be updated to behave like the test UnsupportedPackageType exception, or we could add the new UnsupportedPackageType exception to fetchcode and use it directly in info().

For testing right now, when I run an unsupported PURL type with metadata, the new UnsupportedPackageType exception is raised and identifies the problematic PURL:

. . .

  File "/home/jmh/dev/nexb/purldb/purldb-toolkit/src/purldb_toolkit/purlcli.py", line 160, in collect_metadata
    raise UnsupportedPackageType(purl)
purldb_toolkit.purlcli.UnsupportedPackageType: pkg:nginx/nginx@0.8.9

I also added a simple test that seems to work:

    def test_collect_metadata_unsupported_type(self):
        purl = "pkg:nginx/nginx@0.8.9"
        with pytest.raises(purlcli.UnsupportedPackageType, match=purl):
            metadata_collection = purlcli.collect_metadata(purl)
johnmhoran commented 1 month ago

@pombredanne I'm cleaning up purlcli.py, and then the 2 test files, and will commit and push when done.

I realize you've had no time to read and reply to my question yesterday about the implementation of your example class UnsupportedPackageType(Exception). I'd implemented one approach of that in the metadata command to handle PURL types that are not supported by an underlying command (e.g., metadata and versions, which will encounter a NoneType error for unsupported PURL types if not handled).

However, since this might not be what you have in mind, I am going to remove that for metadata (and the test I added as well), and also remove my own way of handling that for versions -- for now this will allow the NoneType error to bubble up for unsupported PURL types for the metadata and versions commands.

As discussed in this morning's huddle, purl2url.py has no such exception handling and thus for the urls command returns empty values for PURLs like pkg:/johnmhoran/hello rather than raising an exception. metadata calls info() in fetchcode's package.py, and versions calls versions() in fetchcode's package_versions.py. Each handles non-supported types indirectly in the same way, responding to our command queries with a NoneType exception when the PURL type is not supported.

def info(url):
    """
    Return data according to the `url` string
    `url` string can be purl too
    """
    if url:
        try:
            return router.process(url)
        except NoRouteAvailable:
            return

and

def versions(purl):
    """Return all version for a PURL."""
    if purl:
        try:
            return router.process(purl)
        except NoRouteAvailable:
            return
pombredanne commented 1 month ago

@johnmhoran re: https://github.com/nexB/purldb/pull/436#issuecomment-2153323420

I realize you've had no time to read and reply to my question yesterday about the implementation of your example class UnsupportedPackageType(Exception). I'd implemented one approach of that in the metadata command to handle PURL types that are not supported by an underlying command (e.g., metadata and versions, which will encounter a NoneType error for unsupported PURL types if not handled).

This works, but a simpler and better option in the CLI is to implement a callback instead that will parse the purl and raise an exception IFF the type is not in a supported list of types. See https://github.com/nexB/purldb/blob/4a9ba34901ac494d063d786daf3a5e002abcf9a9/purldb-toolkit/src/purldb_toolkit/purlcli.py#L1045 for an example of such a callback.

However, since this might not be what you have in mind, I am going to remove that for metadata (and the test I added as well), and also remove my own way of handling that for versions -- for now this will allow the NoneType error to bubble up for unsupported PURL types for the metadata and versions commands.

As discussed in this morning's huddle, purl2url.py has no such exception handling and thus for the urls command returns empty values for PURLs like pkg:/johnmhoran/hello rather than raising an exception. metadata calls info() in fetchcode's package.py, and versions calls versions() in fetchcode's package_versions.py. Each handles non-supported types indirectly in the same way, responding to our command queries with a NoneType exception when the PURL type is not supported.

Using a callback enable an early validation and is a better option for the command line.

def info(url):
    """
    Return data according to the `url` string
    `url` string can be purl too
    """
    if url:
        try:
            return router.process(url)
        except NoRouteAvailable:
            return

and

def versions(purl):
    """Return all version for a PURL."""
    if purl:
        try:
            return router.process(purl)
        except NoRouteAvailable:
            return

So the behavior is to return None in these cases. This looks fine to me. You could test if you have a value if needed. But for now, keep things simple, let exceptions bubble up.

johnmhoran commented 1 month ago

@pombredanne I did not follow all that you wrote but for now I am not making any more changes along these lines until we can discuss and I can ask questions real time. At this point my goal is to clean up the logging/messaging structure and related tests I added in purldb-toolkit and then fix the fecthcode cocoapods logging/messaging -- then we can discuss improvements going forward and I'll be eager to learn and implement.

johnmhoran commented 1 month ago

@pombredanne I've finished removing the PURL CLI logging/messaging structure I'd added, merged the most recent main as well as your latest commit to this PR, resolved conflicts, and otherwise updated/fixed the tests. Please take a look when you have the chance.