Closed JulienPalard closed 1 year ago
Amusing fact: as the stackoverflow.com answer does, using msgcat --no-wrap
before using diff3 helps a lot as there's no longer the msgstr ""
fence line between the two edits.
I wrote a small script to try it, but in case of conflict it's no longer possible to wrap the file until the conflicts are resolved as due to the conflict markers it's no longer a valid gettext file. Feel like 90% of the job is done and 90% is remaining.
I've never tried this: https://github.com/beck/git-po-merge
It's seems to be the kind of tool you need, or inspiration maybe.
I've never tried this: https://github.com/beck/git-po-merge
It's seems to be the kind of tool you need, or inspiration maybe.
No, they use a dumb msgcat local.po base.po remote.po
, it's not a 3-way merge, won't detect any conflict properly, just overwriting things. For our examples it yields:
#: library/random.rst:260
msgid ""
"In the future, the *population* must be a sequence. Instances of :class:"
"`set` are no longer supported. The set must first be converted to a :class:"
"`list` or :class:`tuple`, preferably in a deterministic order so that the "
"sample is reproducible."
msgstr ""
"À l’avenir, la *population* devra être une séquence. Les instances de :"
"class:`set` ne sont plus prises en charge. Le *set* doit d’abord être "
"converti en une :class:`list` ou :class:`tuple`, de préférence dans un ordre "
"déterministe de telle sorte que l’échantillon soit reproductible."
Which is the wrong msgid
!
I tried an implementation:
#!/usr/bin/env python3
import argparse
from pathlib import Path
from tempfile import TemporaryDirectory
from subprocess import run, PIPE
import sys
def parse_args():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("base", type=Path)
parser.add_argument("local", type=Path)
parser.add_argument("remote", type=Path)
parser.add_argument("merged", type=Path)
return parser.parse_args()
def unwrap(text):
lines = text.split("\n")
output = []
for line in lines:
if not line:
output.append("")
continue
if line[0] == line[-1] == '"':
output[-1] += "\x1e" + line
else:
output.append(line)
return "\n".join(output)
def rewrap(text):
return text.replace("\x1e", "\n")
def main():
args = parse_args()
with TemporaryDirectory() as tmpdir_name:
tmpdir = Path(tmpdir_name)
(tmpdir / "local.po").write_text(unwrap(args.local.read_text()))
(tmpdir / "base.po").write_text(unwrap(args.base.read_text()))
(tmpdir / "remote.po").write_text(unwrap(args.remote.read_text()))
(tmpdir / "merged.po").write_text(
run(
[
"git",
"merge-file",
"-p",
str(tmpdir / "local.po"),
str(tmpdir / "base.po"),
str(tmpdir / "remote.po"),
],
encoding="UTF-8",
stdout=PIPE,
).stdout
)
args.merged.write_text(rewrap((tmpdir / "merged.po").read_text()))
if __name__ == "__main__":
main()
It's more a trick that anything, it works by unwrapping msgid and msgstr strings (àla msgcat --no-wrap
) then rewrapping them without using msgcat
(as msgcat
would not be able to unwrap the file containing the conflict markers).
For my tricky example it yields exactly what I want:
<<<<<<< /tmp/local.po
#: library/random.rst:259
msgid ""
"The *population* must be a sequence. Automatic conversion of sets to lists "
"is no longer supported."
msgstr ""
||||||| /tmp/base.po
#: library/random.rst:260
msgid ""
"In the future, the *population* must be a sequence. Instances of :class:"
"`set` are no longer supported. The set must first be converted to a :class:"
"`list` or :class:`tuple`, preferably in a deterministic order so that the "
"sample is reproducible."
msgstr ""
=======
#: library/random.rst:260
msgid ""
"In the future, the *population* must be a sequence. Instances of :class:"
"`set` are no longer supported. The set must first be converted to a :class:"
"`list` or :class:`tuple`, preferably in a deterministic order so that the "
"sample is reproducible."
msgstr ""
"À l’avenir, la *population* devra être une séquence. Les instances de :"
"class:`set` ne sont plus prises en charge. Le *set* doit d’abord être "
"converti en une :class:`list` ou :class:`tuple`, de préférence dans un ordre "
"déterministe de telle sorte que l’échantillon soit reproductible."
>>>>>>> /tmp/remote.po
It clearly shows that one branch changed the msgid
while the other changed the msgstr
. I'll try some cherry-pick from 3.10 to 3.11 using this driver to see how it goes.
Playing with my script, it yields very readable conflicts, like:
where in one branch someone removed a fuzzy by fixing the msgstr, and in the other the msgid changed.
I'm happy.
Do you think we may have missed such a behavior in the previous merges?
Yes, but not that much:
I'm currently diffing python-docs-fr/3.11 vs a locally generated 3.11 branch using my merge strategy to see if errors were introduced.
For the moment I don't see huge differences, but yesterday it felt more like debugging my merge strategy than searching for merge issues in the 3.11 branch...
Good thing is: my merge strategy properly spotted the initial random.po case, that's one of the differences between the two!
For the curious, I'll release my merge strategy on PyPI soon™, and I'll push a single commit on top of 3.11 to fix all merge issues I found so anyone can inspect if. No hard push, I promise.
OK, if I did it correctly here's the various fixes:
https://github.com/python/python-docs-fr/commit/43c0e9d57c4c53ca0080d07ff1a0454ad8e79b3b
it's on top of 3.11.
Released po3way: https://pypi.org/project/po3way/ / https://github.com/JulienPalard/po3way if someone want to test it / use it. Please don't close the issue until it's properly documented in the forward porting section of CONTRIBUTING-CORE.
I just got an issue:
Given a commit on 3.10 (c1999471) containing:
And a commit on 3.11 (fecaae68) containing:
the two¹ get silently merged to:
Which is not a git issue! git is plain right here: those are two non-overlaping modifications, no need for a conflict. But it's very bad for us as it give a very wrong result.
1) Yes I know git does not merge diffs, it merges the result of them: actual plain files. But showing diffs was usefull to give context.
So in other words, as git sees it, I have a base looking like:
In one branch it gets to:
While in the other it gets to:
Git reolves it as:
I think this may be cleanly resolved using a proper "git merge driver" to implement 3-way merge for po files specifically, but was unable to find one.
I bet we could spawn one based on polib, or difflib (which only does 2-way merges), or both, I don't know, but at the moment I'm just surprised it does not already exists.
Found a few leads but nothing that actually works: