python / python-docs-fr

Mirroir de https://git.afpy.org/AFPy/python-docs-fr
https://docs.python.org/fr/
Other
353 stars 268 forks source link

Need a git merge driver for po files? #1908

Closed JulienPalard closed 1 year ago

JulienPalard commented 2 years ago

I just got an issue:

Given a commit on 3.10 (c1999471) containing:

@@ -507,10 +520,14 @@ msgid ""
 "`list` or :class:`tuple`, preferably in a deterministic order so that the "
 "sample is reproducible."
 msgstr ""
+"À l’avenir, la *population* devra être une séquence.  Les instances de :"
+"class:`set` ne sont plus prises en charge.  Le *set* doit d’abord être "
+"converti en une :class:`list` ou :class:`tuple`, de préférence dans un ordre "
+"déterministe de telle sorte que l’échantillon soit reproductible."

And a commit on 3.11 (fecaae68) containing:

-#: library/random.rst:260
+#: library/random.rst:259
 msgid ""
-"In the future, the *population* must be a sequence.  Instances of :class:"
-"`set` are no longer supported.  The set must first be converted to a :class:"
-"`list` or :class:`tuple`, preferably in a deterministic order so that the "
-"sample is reproducible."
+"The *population* must be a sequence.  Automatic conversion of sets to lists "
+"is longer supported."
 msgstr ""

the two¹ get silently merged to:

#: library/random.rst:259
msgid ""
"The *population* must be a sequence.  Automatic conversion of sets to lists "
"is no longer supported."
msgstr ""
"À l’avenir, la *population* devra être une séquence.  Les instances de :"
"class:`set` ne sont plus prises en charge.  Le *set* doit d’abord être "
"converti en une :class:`list` ou :class:`tuple`, de préférence dans un ordre "
"déterministe de telle sorte que l’échantillon soit reproductible."

Which is not a git issue! git is plain right here: those are two non-overlaping modifications, no need for a conflict. But it's very bad for us as it give a very wrong result.

1) Yes I know git does not merge diffs, it merges the result of them: actual plain files. But showing diffs was usefull to give context.

So in other words, as git sees it, I have a base looking like:

#: library/random.rst:260
msgid ""
"In the future, the *population* must be a sequence.  Instances of :class:"
"`set` are no longer supported.  The set must first be converted to a :class:"
"`list` or :class:`tuple`, preferably in a deterministic order so that the "
"sample is reproducible."
msgstr ""

In one branch it gets to:

#: library/random.rst:259
msgid ""
"The *population* must be a sequence.  Automatic conversion of sets to lists "
"is no longer supported."
msgstr ""

While in the other it gets to:

#: library/random.rst:260
msgid ""
"In the future, the *population* must be a sequence.  Instances of :class:"
"`set` are no longer supported.  The set must first be converted to a :class:"
"`list` or :class:`tuple`, preferably in a deterministic order so that the "
"sample is reproducible."
msgstr ""
"À l’avenir, la *population* devra être une séquence.  Les instances de :"
"class:`set` ne sont plus prises en charge.  Le *set* doit d’abord être "
"converti en une :class:`list` ou :class:`tuple`, de préférence dans un ordre "
"déterministe de telle sorte que l’échantillon soit reproductible."

Git reolves it as:

#: library/random.rst:259
msgid ""
"The *population* must be a sequence.  Automatic conversion of sets to lists "
"is no longer supported."
msgstr ""
"À l’avenir, la *population* devra être une séquence.  Les instances de :"
"class:`set` ne sont plus prises en charge.  Le *set* doit d’abord être "
"converti en une :class:`list` ou :class:`tuple`, de préférence dans un ordre "
"déterministe de telle sorte que l’échantillon soit reproductible."

I think this may be cleanly resolved using a proper "git merge driver" to implement 3-way merge for po files specifically, but was unable to find one.

I bet we could spawn one based on polib, or difflib (which only does 2-way merges), or both, I don't know, but at the moment I'm just surprised it does not already exists.

Found a few leads but nothing that actually works:

JulienPalard commented 2 years ago

Amusing fact: as the stackoverflow.com answer does, using msgcat --no-wrap before using diff3 helps a lot as there's no longer the msgstr "" fence line between the two edits.

I wrote a small script to try it, but in case of conflict it's no longer possible to wrap the file until the conflicts are resolved as due to the conflict markers it's no longer a valid gettext file. Feel like 90% of the job is done and 90% is remaining.

vpoulailleau commented 2 years ago

I've never tried this: https://github.com/beck/git-po-merge

It's seems to be the kind of tool you need, or inspiration maybe.

JulienPalard commented 2 years ago

I've never tried this: https://github.com/beck/git-po-merge

It's seems to be the kind of tool you need, or inspiration maybe.

No, they use a dumb msgcat local.po base.po remote.po, it's not a 3-way merge, won't detect any conflict properly, just overwriting things. For our examples it yields:

#: library/random.rst:260
msgid ""
"In the future, the *population* must be a sequence.  Instances of :class:"
"`set` are no longer supported.  The set must first be converted to a :class:"
"`list` or :class:`tuple`, preferably in a deterministic order so that the "
"sample is reproducible."
msgstr ""
"À l’avenir, la *population* devra être une séquence.  Les instances de :"
"class:`set` ne sont plus prises en charge.  Le *set* doit d’abord être "
"converti en une :class:`list` ou :class:`tuple`, de préférence dans un ordre "
"déterministe de telle sorte que l’échantillon soit reproductible."

Which is the wrong msgid!

I tried an implementation:

#!/usr/bin/env python3

import argparse
from pathlib import Path
from tempfile import TemporaryDirectory
from subprocess import run, PIPE
import sys

def parse_args():
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument("base", type=Path)
    parser.add_argument("local", type=Path)
    parser.add_argument("remote", type=Path)
    parser.add_argument("merged", type=Path)
    return parser.parse_args()

def unwrap(text):
    lines = text.split("\n")
    output = []
    for line in lines:
        if not line:
            output.append("")
            continue
        if line[0] == line[-1] == '"':
            output[-1] += "\x1e" + line
        else:
            output.append(line)
    return "\n".join(output)

def rewrap(text):
    return text.replace("\x1e", "\n")

def main():
     args = parse_args()
     with TemporaryDirectory() as tmpdir_name:
         tmpdir = Path(tmpdir_name)

         (tmpdir / "local.po").write_text(unwrap(args.local.read_text()))
         (tmpdir / "base.po").write_text(unwrap(args.base.read_text()))
         (tmpdir / "remote.po").write_text(unwrap(args.remote.read_text()))

         (tmpdir / "merged.po").write_text(
             run(
                 [
                     "git",
                     "merge-file",
                     "-p",
                     str(tmpdir / "local.po"),
                     str(tmpdir / "base.po"),
                     str(tmpdir / "remote.po"),
                 ],
                 encoding="UTF-8",
                 stdout=PIPE,
             ).stdout
         )
         args.merged.write_text(rewrap((tmpdir / "merged.po").read_text()))

if __name__ == "__main__":
    main()

It's more a trick that anything, it works by unwrapping msgid and msgstr strings (àla msgcat --no-wrap) then rewrapping them without using msgcat (as msgcat would not be able to unwrap the file containing the conflict markers).

For my tricky example it yields exactly what I want:


<<<<<<< /tmp/local.po
#: library/random.rst:259
msgid ""
"The *population* must be a sequence.  Automatic conversion of sets to lists "
"is no longer supported."
msgstr ""
||||||| /tmp/base.po
#: library/random.rst:260
msgid ""
"In the future, the *population* must be a sequence.  Instances of :class:"
"`set` are no longer supported.  The set must first be converted to a :class:"
"`list` or :class:`tuple`, preferably in a deterministic order so that the "
"sample is reproducible."
msgstr ""
=======
#: library/random.rst:260
msgid ""
"In the future, the *population* must be a sequence.  Instances of :class:"
"`set` are no longer supported.  The set must first be converted to a :class:"
"`list` or :class:`tuple`, preferably in a deterministic order so that the "
"sample is reproducible."
msgstr ""
"À l’avenir, la *population* devra être une séquence.  Les instances de :"
"class:`set` ne sont plus prises en charge.  Le *set* doit d’abord être "
"converti en une :class:`list` ou :class:`tuple`, de préférence dans un ordre "
"déterministe de telle sorte que l’échantillon soit reproductible."
>>>>>>> /tmp/remote.po

It clearly shows that one branch changed the msgid while the other changed the msgstr. I'll try some cherry-pick from 3.10 to 3.11 using this driver to see how it goes.

JulienPalard commented 2 years ago

Playing with my script, it yields very readable conflicts, like:

Capture d’écran du 2022-09-14 00-35-29

where in one branch someone removed a fuzzy by fixing the msgstr, and in the other the msgid changed.

I'm happy.

christopheNan commented 2 years ago

Do you think we may have missed such a behavior in the previous merges?

JulienPalard commented 2 years ago

Yes, but not that much:

I'm currently diffing python-docs-fr/3.11 vs a locally generated 3.11 branch using my merge strategy to see if errors were introduced.

For the moment I don't see huge differences, but yesterday it felt more like debugging my merge strategy than searching for merge issues in the 3.11 branch...

Good thing is: my merge strategy properly spotted the initial random.po case, that's one of the differences between the two!

JulienPalard commented 2 years ago

For the curious, I'll release my merge strategy on PyPI soon™, and I'll push a single commit on top of 3.11 to fix all merge issues I found so anyone can inspect if. No hard push, I promise.

JulienPalard commented 2 years ago

OK, if I did it correctly here's the various fixes:

https://github.com/python/python-docs-fr/commit/43c0e9d57c4c53ca0080d07ff1a0454ad8e79b3b

it's on top of 3.11.

JulienPalard commented 2 years ago

Released po3way: https://pypi.org/project/po3way/ / https://github.com/JulienPalard/po3way if someone want to test it / use it. Please don't close the issue until it's properly documented in the forward porting section of CONTRIBUTING-CORE.