leveryd-asm / asm

Scanner platform based on Kubernetes and Argo-Workflow 基于k8s和argo工作流的扫描器
https://leveryd-asm.github.io/asm-document
MIT License
106 stars 6 forks source link

任务运行过程中artifacts被删除 #50

Closed leveryd closed 1 year ago

leveryd commented 1 year ago

背景

image

导致资产并没有扫描到

为什么会被删除?

通过workflow-controller日志,可以第一次失败时的时间。

在minio控制台上,看到bucket创建的时间。

image

这两时间吻合,猜测是minio容器重启后,数据就丢失了。

查看minio部署文件,看到确实没有做持久化。重启minio容器也能验证结论

leveryd commented 1 year ago

argo artifactGC 策略

https://sourcegraph.com/github.com/argoproj/argo-workflows/-/blob/examples/artifact-gc-workflow.yaml

minio object lifecycle

https://min.io/docs/minio/linux/administration/object-management/create-lifecycle-management-expiration-rule.html#expire-objects-after-number-of-days

leveryd commented 1 year ago

修改artifactGC 策略后遇到另一个问题

报错如下:

image

workflow无法删除,会导致后续的workflow一直是pending状态

现象类似 https://github.com/argoproj/argo-workflows/issues/10192

临时解决:怎么删除workflow?

argo delete --force

https://argoproj.github.io/argo-workflows/walk-through/artifacts/#what-happens-if-garbage-collection-fails

为什么会出现这个问题?

sa绑定的clusterrole没有patch wartifactgctasks/status的权限

image
➜  /tmp kubectl get clusterrolebinding -n asm -o wide|grep asm/default
admin-asm                                                          ClusterRole/admin                                                                  123d                                                                                                          asm/default
...
➜  /tmp kubectl describe clusterrole admin
Name:         admin
Labels:       kubernetes.io/bootstrapping=rbac-defaults
Annotations:  rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
  Resources                                        Non-Resource URLs  Resource Names  Verbs
  ---------                                        -----------------  --------------  -----
  clusterworkflowtemplates.argoproj.io/finalizers  []                 []              [create delete deletecollection get list patch update watch]
  clusterworkflowtemplates.argoproj.io             []                 []              [create delete deletecollection get list patch update watch]
  cronworkflows.argoproj.io/finalizers             []                 []              [create delete deletecollection get list patch update watch]

role名字虽然带个admin,但并没有patch wartifactgctasks/status的权限。

最终解决办法:赋予admin角色所有权限

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: admin
rules:
  - apiGroups:
    - '*'
    resources:
    - '*'
    verbs:
    - '*'
  - nonResourceURLs:
    - '*'
    verbs:
    - '*'

给asm/default服务账号绑定clusterAdmin角色