Both https://github.com/OWNER/REPO.git and https://github.com/OWNER/REPO are valid git urls. When specify 'includes' variable for daily sync, and its origin is different from the data in OpenSearch, like opensearch doc's origin contains '.git' suffix while 'includes' doesn't. Then there will be 2 different (owner, repo, origin) tuples.
If then we do a full repo daily sync, the 2 tuples will be considered as 2 code bases, and will sync data separately, introducing redundant data into OpenSearch, then to ClickHouse.
The solution is to make sure to eliminate the '.git' suffix before init or sync data.
Both
https://github.com/OWNER/REPO.git
andhttps://github.com/OWNER/REPO
are valid git urls. When specify 'includes' variable for daily sync, and its origin is different from the data in OpenSearch, like opensearch doc's origin contains '.git' suffix while 'includes' doesn't. Then there will be 2 different (owner, repo, origin) tuples.If then we do a full repo daily sync, the 2 tuples will be considered as 2 code bases, and will sync data separately, introducing redundant data into OpenSearch, then to ClickHouse.
The solution is to make sure to eliminate the '.git' suffix before init or sync data.