Open KOLANICH opened 4 years ago
At which point exactly are you loosing the git history ? Normally you never do.
@KaKi87, when other projects copy dirs from other git repos. If you checkout some repo and then copy a dir or a file from it that is not a submodule, you loose history.
"merging unrelated histories" is just git-filter-branch(1)
. The first point is also probably git-filter-branch(1)
.
@nabijaczleweli, how can just git-filter-branch
reconstruct the history?
Project A: a -> some changes in the targeted dir -> unrelated changes in other dirs -> b -> targeted dir moved -> c -> ... -> d
Project B: copies a dir from commit a and does some changes (let's call them x)-> unrelated changes in other dirs -> some changes in the targered projects (let's call it y) -> copies code from c and solves conflicts -> z
And for the second project the commits of the original project from which the code was copied are unknown.
The result we need:
a -β b --β c --β d
β β
βx β yβmerge ->z
So the software should itself identify the commits and recostruct the history, a human must only provide the task in general and maybe do some preprocessing. As you see, the software should also track moves of files.
when other projects copy dirs from other git repos
The algo you mentioned is very time-consuming and labour-consuming without the right tools (git filter-branch --tree-filter is prety universal but damn damn damn damn slow. BFG repo cleaner is faster, since works directly with index, but is very limited and by now is dead in the sense have not updated since 2018. Sequences of are the most fast and versatile (in the sense they are compatible with multiple vcs, but they don't address merges) apprroach,very surprisingly creating a seq of patch files, editing them and applying them is much faster than filter-branch. And creating the right tools is also in scope of this idea). So almost noons uses them. Instead other people just somewhen copy the files. And it is only a half of the issue. Even if they have created patches and applied them (had to do that a lot when reconstructing), git has no means to match patches in unrelated histories to each other and to reconstruct histories with all the merges (which are needed for proper 3-way merges in the end) with fully automatic conflict resolution based on the content of the side-history.
Project description
Sometimes one needs to extract a part of another project, which unfortunately is a monorepo, into an own repo.
Often there may be multiple forks of the same project in the form of just copying and then applying own patches.
Often they don't preserve the info about which exactly version they have taken.
So we need the following set of tools:
identifying commit the most similar to the snapshot given in a form of a dir. We pass a repo and a dir with snapshot, the tool gives the list of n most similar commits.
merging unrelated histories. One history is upstream, another one (let's call it a side-history) a sequence of commits where the code is periodically copied from upstream, then patches are applied. The tool should identify the original split point, make a branch on it, add the content of the first commit there, commit. Then for each commit on side-history it must identify whether it was present in upstream. If it is present, it merges from upstream from the relevant commit, but the content and metadate must be from the side history. So finaly we get the branch for a side-history within upstream repo with all the metadata recovered.
Relevant Technology
Complexity and required time
Complexity
Required time (ETA)
Categories