Open pando85 opened 1 year ago
Yeah, I would also like to see Argo repos.
For people interested in contributing, if you write a Python scanners argo_helm_application
and argo_helm_values
and put them into scanners/ directory. Then I can take on the frontend mapping.
I want to work on this, but im not entirely sure there is a easy way to parse bjw-s and argo version because of how many ways you can set up deployments with argo.
I think it is okay to assume a certain file structure, and filter out anything that doesn't match
Hey @whazor! Could you please explain a bit how this is structured? I'm a little confused how the scanners would be used so if I were to add support, I need to understand how to test it.
A scanner reads all the YAML files, parses the YAML files and in the end populates SQL tables. This results in two SQL dbs (repos.db
, and repos-extended.db
).
Inside the frontend I use the Sqlite database to create kubesearch. So if you have something like flux_helm_release.py, or even multiple scanners, then I can add the Argo support in the frontend.
Because sometimes you need data from two separate files for showing the page correctly. For example, with Flux we want to know more about the Helm repo related to the release.
The scanners scan independently and in the end, the results are combined via SQL joins in the frontend. So if you create argo_helm_application
and argo_helm_values
, then as long as there are columns to join on for a SQL query then we are good.
Scanners go through multiple stages as it otherwise the entire process takes too long.
create_table
-> here you get to create the tables in both repos.db
and repos-extended.db
.Per file:
pre_check
-> The file is not yet parsed as YAML. Here we perform a string check to verify whether the file is a flux helm release file. We do this by checking apiVersion
and kind
. Return true if the file matches. If you want you could also add file name as a parameter here. The idea is to check as fast as possible whether the file is interesting or not. check
-> File is parsed as YAML, here you can use the YAML file to check whether the file is interesting. The walk
method is a convenient utility to query through a YAML file. parse
-> Here we use walk
again to get data the file and then we create an object with all the properties. I use pydantic
for creating the object, with pydanic we can add some checks to ignore broken files. There is also some metadata provided in the rest: InfoModel
. The result will be a object that can be used in the next step.insert
-> Here we insert the data to the tables. Finally:
test
-> a simple sql query to verify there is no data parsing issues.One thing that Argo might need is to add the ability to parse other files based on previous files. Currently we don't do this as it is more complicated and it was not yet needed.
Or having more information about the file name. This should be easier to add to the relevant methods.
There are at least two ways argo apps with helm can be done. One of which is similar to Flux in a way, and the other not so much as it's basically just kustomize with helm support.
How well does it currently handle multiple yaml documents in one file?
The kustomize route would indeed be difficult currently as it needs the ability to parse files based upon other files.
Each YAML doc per file gets processed separately, but the pre_check
is per file. So make sure pre_check
returns true for anything interesting in the file.
parse files based upon other files.
That's also needed for #24
One of the issues is that the YAML parser fails on Go templating stuff. I guess those would need to be filtered before parsing the YAML file.
I did also say this: https://github.com/whazor/k8s-at-home-search/pull/28#discussion_r1703729931
And no it was not ready yet as it was a draft.
Actually, Argo results are getting inserted into the DB. So great job on getting the first argo results in kubesearch 🎉 . In particular PixelJonas has a lot of parsed Argo applications.
The parsing happens in search.py
and every file has a big try catch
around the YAML parsing, so incorrect YAML is ignored. If you run search.py
with only the Argo scanner, you can see a lot of errors related to Go template strings.
For example:
metadata:
name: '{{.cluster}}-guestbook'
spec:
project: my-project
source:
repoURL: https://github.com/infra-team/cluster-deployments.git
targetRevision: HEAD
path: guestbook/{{.cluster}}
destination:
server: '{{.url}}'
namespace: guestbook
In YAML, the guestbook/{{.cluster}}
is not a correct string.
I still think we need a way to do multiple files, as there are definitely differences how Argo handles the ones that are inside the application definition which is why some people (like me) don't use this way.
I'm not entirely sure what would be the best way to do that as the kustomize way could be also non argo repos..
I don't know if this makes sense to say here or it is what you are trying to achieve right now, the repository has to have at least one file with the apiVersion of ArgoCD.
Then, we would find multiple ways of setting the apps like: helm, kustomize or raw k8s objects.
My repository can be used as an example: https://github.com/pando85/homelab
I don't know if this makes sense to say here or it is what you are trying to achieve right now, the repository has to have at least one file with the apiVersion of ArgoCD.
It checks all files for apiversion, kind and that the definition includes helm. Others will get ignored. Your repo doesn't seem to have any which match that so it will not come apart of the search currently despite their being some argo repos already.
Raw k8s objects are pain to parse, so probably not happening. Kustomize and helm are probably the ones it should index/parse.
It checks all files for apiversion, kind and that the definition includes helm. Others will get ignored. Your repo doesn't seem to have any which match that so it will not come apart of the search currently despite their being some argo repos already.
Then you are doing something wrong because my repository as same as others uses that kind of structure. It is a common pattern in Argo CD definitions.
This is common to Argo projects but if it contains kind Application
or ApplicationSet
for sure it is referring to Argo CD definitions.
Raw k8s objects are pain to parse, so probably not happening. Kustomize and helm are probably the ones it should index/parse.
Personally, I always use Kustomize or Helm. I think that it is the most common pattern and it can be the first thing to add. We could parse k8s objects in the future.
Currently the scanners are simple and based on a single YAML document. I think it would be a quick win to actually remove Go templating code via Regex from a file before doing the YAML parse.
I would say that multiple files via kustomize is still doable, but already more difficult to implement. As you need to do the bookkeeping on which files need to be parsed and when. Ideally it doesnt slow down the search script too much.
But actually parsing Go/Helm templates is way too complicated. Especially as a lot of variables are not present or knowable.
This is common to Argo projects but if it contains kind
Application
orApplicationSet
for sure it is referring to Argo CD definitions.
That may be true, but as we are interested in Helm values, it becomes not relevant. As the parts that are relevant are Application, kind and that it includes Helm, eg https://github.com/samip5/k8s-dev-cluster/blob/90458ea939e174146b3d5f41483d6d84a9b1e7e4/kubernetes/infra/controllers/tor/application.yaml#L15 at least for the moment. ApplicationSet wouldn't help if the same file doesn't have anything else of interest which is how this currently works.
It is planned supporting Argo CD repositories like mine: https://github.com/pando85/homelab
It could be great as there are a part of the community that based they repos in Argo CD.
Thank you for this great tool. I used it a lot! :rocket: