pando85 commented 1 year ago

It is planned supporting Argo CD repositories like mine: https://github.com/pando85/homelab

It could be great as there are a part of the community that based they repos in Argo CD.

Thank you for this great tool. I used it a lot! :rocket:

whazor commented 9 months ago

Yeah, I would also like to see Argo repos.

For people interested in contributing, if you write a Python scanners argo_helm_application and argo_helm_values and put them into scanners/ directory. Then I can take on the frontend mapping.

mitchross commented 5 months ago

I want to work on this, but im not entirely sure there is a easy way to parse bjw-s and argo version because of how many ways you can set up deployments with argo.

whazor commented 5 months ago

I think it is okay to assume a certain file structure, and filter out anything that doesn't match

samip5 commented 3 months ago

Hey @whazor! Could you please explain a bit how this is structured? I'm a little confused how the scanners would be used so if I were to add support, I need to understand how to test it.

whazor commented 3 months ago

What does a scanner do.

A scanner reads all the YAML files, parses the YAML files and in the end populates SQL tables. This results in two SQL dbs (repos.db, and repos-extended.db).

Inside the frontend I use the Sqlite database to create kubesearch. So if you have something like flux_helm_release.py, or even multiple scanners, then I can add the Argo support in the frontend.

Combining data from multiple files

Because sometimes you need data from two separate files for showing the page correctly. For example, with Flux we want to know more about the Helm repo related to the release.

The scanners scan independently and in the end, the results are combined via SQL joins in the frontend. So if you create argo_helm_application and argo_helm_values, then as long as there are columns to join on for a SQL query then we are good.

Scanner structure

Scanners go through multiple stages as it otherwise the entire process takes too long.

create_table -> here you get to create the tables in both repos.db and repos-extended.db.

Per file:

pre_check -> The file is not yet parsed as YAML. Here we perform a string check to verify whether the file is a flux helm release file. We do this by checking apiVersion and kind. Return true if the file matches. If you want you could also add file name as a parameter here. The idea is to check as fast as possible whether the file is interesting or not.
check -> File is parsed as YAML, here you can use the YAML file to check whether the file is interesting. The walk method is a convenient utility to query through a YAML file.
parse -> Here we use walk again to get data the file and then we create an object with all the properties. I use pydantic for creating the object, with pydanic we can add some checks to ignore broken files. There is also some metadata provided in the rest: InfoModel. The result will be a object that can be used in the next step.
insert -> Here we insert the data to the tables.

Finally:

test -> a simple sql query to verify there is no data parsing issues.

Improvements

One thing that Argo might need is to add the ability to parse other files based on previous files. Currently we don't do this as it is more complicated and it was not yet needed.

Or having more information about the file name. This should be easier to add to the relevant methods.

samip5 commented 3 months ago

There are at least two ways argo apps with helm can be done. One of which is similar to Flux in a way, and the other not so much as it's basically just kustomize with helm support.

How well does it currently handle multiple yaml documents in one file?

whazor commented 3 months ago

The kustomize route would indeed be difficult currently as it needs the ability to parse files based upon other files.

Each YAML doc per file gets processed separately, but the pre_check is per file. So make sure pre_check returns true for anything interesting in the file.

samip5 commented 3 months ago

parse files based upon other files.

That's also needed for #24

whazor commented 3 months ago

28 got merged, which got like a handful of Argo repos into kubesearch.

One of the issues is that the YAML parser fails on Go templating stuff. I guess those would need to be filtered before parsing the YAML file.

samip5 commented 3 months ago

I did also say this: https://github.com/whazor/k8s-at-home-search/pull/28#discussion_r1703729931

And no it was not ready yet as it was a draft.

whazor commented 3 months ago

Actually, Argo results are getting inserted into the DB. So great job on getting the first argo results in kubesearch 🎉 . In particular PixelJonas has a lot of parsed Argo applications.

The parsing happens in search.py and every file has a big try catch around the YAML parsing, so incorrect YAML is ignored. If you run search.py with only the Argo scanner, you can see a lot of errors related to Go template strings.

For example:

    metadata:
      name: '{{.cluster}}-guestbook'
    spec:
      project: my-project
      source:
        repoURL: https://github.com/infra-team/cluster-deployments.git
        targetRevision: HEAD
        path: guestbook/{{.cluster}}
      destination:
        server: '{{.url}}'
        namespace: guestbook

In YAML, the guestbook/{{.cluster}} is not a correct string.

samip5 commented 3 months ago

I still think we need a way to do multiple files, as there are definitely differences how Argo handles the ones that are inside the application definition which is why some people (like me) don't use this way.

I'm not entirely sure what would be the best way to do that as the kustomize way could be also non argo repos..

pando85 commented 3 months ago

I don't know if this makes sense to say here or it is what you are trying to achieve right now, the repository has to have at least one file with the apiVersion of ArgoCD.

Then, we would find multiple ways of setting the apps like: helm, kustomize or raw k8s objects.

My repository can be used as an example: https://github.com/pando85/homelab

samip5 commented 3 months ago

I don't know if this makes sense to say here or it is what you are trying to achieve right now, the repository has to have at least one file with the apiVersion of ArgoCD.

It checks all files for apiversion, kind and that the definition includes helm. Others will get ignored. Your repo doesn't seem to have any which match that so it will not come apart of the search currently despite their being some argo repos already.

Raw k8s objects are pain to parse, so probably not happening. Kustomize and helm are probably the ones it should index/parse.

pando85 commented 3 months ago

It checks all files for apiversion, kind and that the definition includes helm. Others will get ignored. Your repo doesn't seem to have any which match that so it will not come apart of the search currently despite their being some argo repos already.

Then you are doing something wrong because my repository as same as others uses that kind of structure. It is a common pattern in Argo CD definitions.

This is common to Argo projects but if it contains kind Application or ApplicationSet for sure it is referring to Argo CD definitions.

Raw k8s objects are pain to parse, so probably not happening. Kustomize and helm are probably the ones it should index/parse.

Personally, I always use Kustomize or Helm. I think that it is the most common pattern and it can be the first thing to add. We could parse k8s objects in the future.

whazor commented 3 months ago

Currently the scanners are simple and based on a single YAML document. I think it would be a quick win to actually remove Go templating code via Regex from a file before doing the YAML parse.

I would say that multiple files via kustomize is still doable, but already more difficult to implement. As you need to do the bookkeeping on which files need to be parsed and when. Ideally it doesnt slow down the search script too much.

But actually parsing Go/Helm templates is way too complicated. Especially as a lot of variables are not present or knowable.

samip5 commented 3 months ago

This is common to Argo projects but if it contains kind Application or ApplicationSet for sure it is referring to Argo CD definitions.

That may be true, but as we are interested in Helm values, it becomes not relevant. As the parts that are relevant are Application, kind and that it includes Helm, eg https://github.com/samip5/k8s-dev-cluster/blob/90458ea939e174146b3d5f41483d6d84a9b1e7e4/kubernetes/infra/controllers/tor/application.yaml#L15 at least for the moment. ApplicationSet wouldn't help if the same file doesn't have anything else of interest which is how this currently works.

whazor / k8s-at-home-search

Argo CD based repos support #18

What does a scanner do.

Combining data from multiple files

Scanner structure

Improvements

28 got merged, which got like a handful of Argo repos into kubesearch.