jozu-ai / kitops

Tools for easing the handoff between AI/ML and App/SRE teams.
https://KitOps.ml
Apache License 2.0
274 stars 26 forks source link

Advanced filtering for unpack #391

Open gorkem opened 3 days ago

gorkem commented 3 days ago

Describe the problem you're trying to solve It should be possible to unpack only named artifacts from an artifact group. For instance, it should be possible to unpack only README.md from docs.

Describe the solution you'd like Add a new flag to unpack command like --filter the value of the filter should be able to indicate the artifact type and a name/patch to match. Any names/paths that partially or fully to the filter should be extracted.

And example would be kit unpack --filter=code:*config.* which would extract all the files that has "config." in the path names

amisevsk commented 3 days ago

Do we want the filter to work per-layer, or per-file? Using the yet to be merged docs layers as an example, if we have a kitfile

docs
  - name: model-documentation
    path: docs/

can my glob (--filter=docs:README*) extract all readme files? How do we handle deeply-nested files?

My concern in this case is that we're writing a fairly complicated filter spec to handle files that the user is not necessarily familiar with (what are the filenames of files that are of interest to me? Is it named README.md or readme.md?). If I hand you a modelkit, will you be able to meaningfully use filters like this to get what you want?

My initial conception of this sort of feature would be that it works more on layers: you have to include more context in the kitfile, but if you have something like

docs:
  - name: main-documentation
    path: docs/
  - name: readme
    path: README.md
  - name: changelog
    path: CHANGELOG.md

you could use filters as follows:

This is simpler but has some benefits:

  1. Kit doesn't need to unpack all layers in order to do the unpack (with filepath globs, we'd have to look in the big main-documentation layer even if you just want the readme)
  2. It's arguably more useful for sharing modelkits, as you can at-a-glance see what's relevant for extracting
  3. It's simpler to understand and doesn't require understanding how unix-style file globbing works.
  4. It avoids some strange edge cases that are otherwise tricky (e.g. unpacking some/deeply/nested/README.md that references other files and is broken if you only have it)