Closed Zhihui-Ellen-Jiang closed 1 month ago
I'm sorry. Can you help me understand this a little more. Are you saying that it was all working fine and then you went away from it and came back and then it wasn't working anymore?
Did you look at the logs for gitsync?
If you run 'ls -l' in the gitsync root, it will show you the hash that it has checked out.
Can you also say which version of gitsync you're using? If you can post a full set of logs preferably with –v 6. It would really help.
hello it was syncing fine then it stopped. I guess I can't say it "stopped", but syncing to the wrong destination. I noticed because I was in the middle of making changes to my github repo Airflow DAGs, I have synced successfully a couple of times in a straight sitting, and then it stopped showing up correctly on Airflow UI. I have checked and it was synced to the "repo" as/opt/airflow/dags/repo
. but it suppose to be synced to /opt/airflow/dags
jiang@zuij-ltmxfwn Airflow-on-KinD % kubectl logs airflow-scheduler-c4897b579-2hrb4 -n airflow -c git-sync
INFO: detected pid 1, running init handler
{"logger":"","ts":"2024-07-26 02:39:07.121457","caller":{"file":"main.go","line":361},"level":0,"msg":"setting --ref from deprecated --branch"}
{"logger":"","ts":"2024-07-26 02:39:07.121631","caller":{"file":"main.go","line":393},"level":0,"msg":"setting --link from deprecated --dest"}
{"logger":"","ts":"2024-07-26 02:39:07.121703","caller":{"file":"main.go","line":523},"level":0,"msg":"starting up","pid":12,"uid":65533,"gid":65533,"home":"/tmp","flags":["--add-user=true","--branch=main","--change-permissions=0","--cookie-file=false","--credential=[]","--depth=1","--dest=repo","--exechook-backoff=3s","--exechook-timeout=30s","--git=git","--git-gc=always","--group-write=false","--help=false","--http-metrics=false","--http-pprof=false","--link=repo","--man=false","--max-failures=0","--max-sync-failures=0","--one-time=false","--period=10s","--ref=main","--repo=https://github.com/xxx/airflow-dags.git","--rev=HEAD","--root=/git","--ssh=false","--ssh-key-file=[/etc/git-secret/ssh]","--ssh-known-hosts=false","--ssh-known-hosts-file=/etc/git-secret/known_hosts","--stale-worktree-timeout=0s","--submodules=recursive","--sync-timeout=2m0s","--timeout=0","--v=-1","--verbose=0","--version=false","--wait=0","--webhook-backoff=3s","--webhook-method=POST","--webhook-success-status=200","--webhook-timeout=1s"]}
{"logger":"","ts":"2024-07-26 02:39:07.358613","caller":{"file":"main.go","line":1639},"level":0,"msg":"update required","ref":"main","local":"5d24c4f15062910380fce812c0aeb4567e71e519","remote":"5d24c4f15062910380fce812c0aeb4567e71e519","syncCount":0}
{"logger":"","ts":"2024-07-26 02:39:07.684549","caller":{"file":"main.go","line":1690},"level":0,"msg":"updated successfully","ref":"main","remote":"5d24c4f15062910380fce812c0aeb4567e71e519","syncCount":1}
`ihui.jiang@zhihuij-ltmxfwn Airflow-on-KinD % kubectl describe pod airflow-scheduler-c4897b579-2hrb4 -n airflow | grep -A 5 "git-sync"
git-sync-init:
Container ID: containerd://2f4a8e4bac3fe7371c76dcc2ff7f99b63444a5f31d97f1d36298ea36b385fae7
Image: registry.k8s.io/git-sync/git-sync:v4.1.0
Image ID: registry.k8s.io/git-sync/git-sync@sha256:fd9722fd02e3a559fd6bb4427417c53892068f588fc8372aa553fbf2f05f9902
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
--
/etc/git-secret/ssh from git-sync-ssh-key (ro,path="gitSshKey")
/git from dags (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ngkzw (ro)
Containers:
scheduler:
Container ID: containerd://1ae2434d875776a45c0ef04defa16f33c8cc7d5313da8193b0b6c398cd00014e
--
git-sync:
Container ID: containerd://52db865fe762253a99ec89ad28f8937a5433dcde5a690115d6b325c5fee2bb09
Image: registry.k8s.io/git-sync/git-sync:v4.1.0
Image ID: registry.k8s.io/git-sync/git-sync@sha256:fd9722fd02e3a559fd6bb4427417c53892068f588fc8372aa553fbf2f05f9902
Port: <none>
Host Port: <none>
State: Running
Started: Thu, 25 Jul 2024 19:39:07 -0700
Ready: True
--
/etc/git-secret/ssh from git-sync-ssh-key (ro,path="gitSshKey")
/git from dags (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ngkzw (ro)
scheduler-log-groomer:
Container ID: containerd://6d91b3ac1e0ab559da361d5bfda506f5b5ee8bc992d0761d9ea5cee8452aad44
Image: apache/airflow:2.9.2
--
git-sync-ssh-key:
Type: Secret (a volume populated by a Secret)
SecretName: airflow-ssh-git-secret
Optional: false
logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
--
Normal Pulled 8m43s kubelet Container image "registry.k8s.io/git-sync/git-sync:v4.1.0" already present on machine
Normal Created 8m43s kubelet Created container git-sync-init
Normal Started 8m43s kubelet Started container git-sync-init
Normal Pulled 8m42s kubelet Container image "apache/airflow:2.9.2" already present on machine
Normal Created 8m42s kubelet Created container scheduler
Normal Started 8m41s kubelet Started container scheduler
Normal Pulled 8m41s kubelet Container image "registry.k8s.io/git-sync/git-sync:v4.1.0" already present on machine
Normal Created 8m41s kubelet Created container git-sync
Normal Started 8m41s kubelet Started container git-sync
Normal Pulled 8m41s kubelet Container image "apache/airflow:2.9.2" already present on machine
Normal Created 8m41s kubelet Created container scheduler-log-groomer
Normal Started 8m41s kubelet Started container scheduler-log-groomer
Warning Unhealthy 8m32s kubelet Startup probe failed: /home/airflow/.local/lib/python3.12/site-packages/airflow/metrics/statsd_logger.py:184 RemovedInAirflow3Warning: The basic metric validator will be deprecated in the future in favor of pattern-matching. You can try this now by setting config option metrics_use_pattern_match to True.
No alive jobs found.
`
That log shows this as the imporant flags:
git-sync "--depth=1" "--link=repo" "--ref=main" "--root=/git"
That is going to sync the repo into /git, and publish the worktree at /git/repo (which will be a symlink to a local directory named after the git SHA).
I don't know what /opt/airflow/dags is or where it comes from. You should run ls -l
on all of those intermediate directories - I think there's symlink shenanigans going on.
Also 4.1.0 is pretty old - a lot of bugs have been fixed since then. Current is 4.2.4
Thank you for the suggestions and I have followed the steps. It is still not showing up on my Airflow UI. I have upgraded to 4.2.4, and somehow this is showing up with a runtime error when getting the DAG directory
RuntimeError: Detected recursive loop when walking DAG directory /opt/airflow/dags: /opt/airflow/dags/repo/.worktrees/5d24c4f15062910380fce812c0aeb4567e71e519 has appeared more than once.
Logs are here:
`[2024-07-26T14:51:43.459+0000] {manager.py:272} WARNING - DagFileProcessorManager (PID=128) exited with exit code 1 - re-launching
[2024-07-26T14:51:43.462+0000] {manager.py:170} INFO - Launched DagFileProcessorManager with pid: 129
[2024-07-26T14:51:43.468+0000] {settings.py:60} INFO - Configured default timezone UTC
[2024-07-26T14:51:43.498+0000] {settings.py:518} INFO - Loaded airflow_local_settings from /opt/airflow/config/airflow_local_settings.py .
Process ForkProcess-73:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/dag_processing/manager.py", line 241, in _run_processor_manager
processor_manager.start()
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/dag_processing/manager.py", line 476, in start
return self._run_parsing_loop()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/dag_processing/manager.py", line 549, in _run_parsing_loop
self._refresh_dag_dir()
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/dag_processing/manager.py", line 738, in _refresh_dag_dir
self._file_paths = list_py_file_paths(self._dag_directory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/file.py", line 298, in list_py_file_paths
file_paths.extend(find_dag_file_paths(directory, safe_mode))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/file.py", line 311, in find_dag_file_paths
for file_path in find_path_from_directory(directory, ".airflowignore"):
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/file.py", line 241, in _find_path_from_directory
raise RuntimeError(
RuntimeError: Detected recursive loop when walking DAG directory /opt/airflow/dags: /opt/airflow/dags/repo/.worktrees/5d24c4f15062910380fce812c0aeb4567e71e519 has appeared more than once.
`
And my Dag files are in /opt/airflow/dags/repo/.worktrees/5d24c4f15062910380fce812c0aeb4567e71e519
as shown below
`hui.jiang@zhihuij-ltmxfwn Airflow-on-KinD % kubectl exec -it airflow-scheduler-654996d476-w9b6q -n airflow -- ls -l /opt/airflow/dags/repo/.worktrees/5d24c4f15062910380fce812c0aeb4567e71e519
Defaulted container "scheduler" out of: scheduler, git-sync, scheduler-log-groomer, wait-for-airflow-migrations (init), git-sync-init (init)
total 8
-rw-r--r-- 1 65533 root 973 Jul 26 14:50 100.py
-rw-r--r-- 1 65533 root 1566 Jul 26 14:50 fibo.py`
Can you run ls -ld
on each of /opt /opt/airflow /opt/airflow/dags and opt/airflow/dags/repo? Maybe also on /git
Also, if you can, run git-sync with -v 6
- the short logs you posted don't line up with what you are saying.
hello, thank you for helping me. I have solved this issue. I change the dags_folder = /opt/airflow/dags/repo
and it correctly mounted
This is my airflow dags with git-sync, they are in the
**/opt/airflow/dags/repo**
. However, I have not made any changes to my destination. It was working fine this morning, but then in the afternoon, it got into the repo. It is really weird.Also, I have tried a way to change my
mountPath: /opt/airflow/dags/repo
, but it got worse. Now my airflow dags are in/opt/airflow/dags/repo/repo
. Somehow it is creating nested repo for my Airflow dag files, and I am unable to sync my file to Airflow. The following is the section for Git sync in my values.yaml file for Airflow, I have not changed anything for my config. Now it is set back to the one was working before anything stopped syncing: