juledwar / soufi

Source finder CLI and API
Apache License 2.0
0 stars 0 forks source link

Refactor Yum finders to use a common base class, add RHEL finder #12

Closed 0xDEC0DE closed 2 years ago

0xDEC0DE commented 2 years ago

Oh boy, this one is a MASSIVE amount of changes, and will likely merit much discussion, as many of the decisions that went into this were made unilaterally and with zero input from anyone.

Create a new common YumFinder class

This creates a new, common finder for all Yum-based distros. The "fast lookup" patterns have been completely discarded in favour of "fuzzy" repomd lookups, with verification via HTTP HEAD requests.

These finders must be initialized with a canonical search list of repository URLs containing repodata/repomd.xml files. The intention is for sub-classes of this finder to provide helper methods to fill out sensible default search directories, while also allowing for users to provide their own lists (e.g., for an internal-only mirror of packages)

Refactor Yum-based finders as a subclasses of YumFinder

All of the previously-custom XPath searching code is now relegated to building search lists to feed the initializer in a default configuration. This makes such things far more palatable, since by design SouFi will only try to make sense out of HTML tag soup when talking to well-known providers (i.e., not local mirrors with god-knows-what index modes enabled)

Adds functional tests

A very (VERY) small subset of functionality is under test from this, mostly to demonstrate real-world improvements where these finders can locate sources the previous versions cannot. They are disabled by default.

Fixes Issue #9 Fixes Issue #10 Fixes Issue #11

0xDEC0DE commented 2 years ago

Please do not squash+merge.

juledwar commented 2 years ago

Holy massive PR batman. I'll need some time to digest and will probably not merge until you're around again.

juledwar commented 2 years ago

Looks ok so far but there's a coverage error.

0xDEC0DE commented 2 years ago

Looks ok so far but there's a coverage error.

When I flopped the trailing-slash logic in the Photon finder, it became necessary to also flop the tests, and I didn't. Whoops.

But looking at it again, I went with str.endswithand slicing in Photon, whereas elsewhere I just used str.rstrip. I'll push an update to "harmonize" that.

0xDEC0DE commented 2 years ago

Not exactly rigorous, since I only ran each once, but it's still a pretty stark difference:

CentOS Discovery as List:

real    3m35.277s
user    0m40.642s
sys 0m8.056s

CentOS Discovery as Generator:

real    1m55.055s
user    0m39.000s
sys 0m8.508s
juledwar commented 2 years ago

I've run out of steam looking at this, there's enough feedback for now.