tinganho / l10ns

Internationalization workflow and formatting
http://l10ns.org
Apache License 2.0
234 stars 24 forks source link

Add padding to JSON to avoid git merge conflicts #133

Open stipsan opened 7 years ago

stipsan commented 7 years ago

Hey,

We've noticed that the more people we have working on our JS SPA app views, all of which are translated, the more painful git merge conflicts we get.

A simple solution to avoid them can be seen here: http://stackoverflow.com/a/33262179

In other words, have a linebreak between each translation in the JS.

Great work on l10ns btw! We reviewed Polyglot.js and many other solutions but we found l10ns to be superior 👍

tinganho commented 7 years ago

Can you give an example of one merge conflict?

stipsan commented 7 years ago

Bob the dev starts working on feature A while Kyle the engineer starts working on feature B. Both of them add translations and the en_US.json file (and others) gets new entries added.

Now Bob submits a PR, all is well and it's merged without trouble. But now Kyle will get a merge conflict when he pulls in master now that Bobs changes to the json files is added. Kyle have to resolve them manually and it's quite the pain.

The repo we're experience this on isn't public, but I can set one up for you if you want to investigate further and see the problem in action.

tinganho commented 7 years ago

I was meaning something specific to our l10ns's json files? What you are describing is just a general merge conflict.

L10ns currently order both entries and properties to make it more git friendly. If you are changing the same line of course a manual resolve is needed. Note, adding line padding will not resolve that.

stipsan commented 7 years ago

Ah I see, no we're not changing the same lines. We simply add new strings to the language files at the same time. In our project we're working on a React app. We use feature branching and require all strings in the views to be translated. Therefore we quite often add new strings to language files at the same time and that's the kind of merge conflict that would be avoided with the extra padding. The padding would prevent git from being unsure if the changes are related or not.

If we edit the same lines there's still going to be a conflict with the padding, that's not the issue we're trying to address.

tinganho commented 7 years ago

I think I need a good isolated reproducible example. With the localization files from L10ns.

stipsan commented 7 years ago

Ok sure, I'll make one and post the link in this issue 👍

stipsan commented 7 years ago

I managed to (eventually) reproduce it here: https://github.com/stipsan/l10ns-merge-conflict/pull/7 But I went ahead and tried the proposed padding solution from stack overflow and it actually made no difference: https://github.com/stipsan/l10ns-merge-conflict/pull/16/files

It's clear that this is something that likely cannot be solved in l10ns itself, it's a problem with the default merge driver in git. This module looks promising: https://www.npmjs.com/package/git-json-merge

I'll report back if it git-json-merge works out. If it does it may be worth adding to the l10ns readme as a protip 😃

tinganho commented 7 years ago

Thanks for doing the research.

cadesalaberry commented 7 years ago

I was on the same path,

Sadly https://github.com/jonatanpedersen/git-json-merge/ makes no difference as it is hard for a diff engine to resolve conflicts in arrays.

Beware, when used with l10ns, the current git-json-merge plugin can lead to data destruction...

We are trying to come up with a solution https://github.com/jonatanpedersen/git-json-merge/pull/3

stipsan commented 7 years ago

@cadesalaberry thanks a lot for taking the time to share that! I hadn't gotten around to test it out yet and your feedback helped me dodge that bullet 👌

tinganho commented 7 years ago

To summarize the problem:

Two user adds a localization entry in en.json:

  {
    "files": [
>>>>>> HEAD
      "Source/Content/Offline/InternalServerError.tsx"
    ],
    "id": "p7Le4KgRFed",
    "key": "INTERNAL_SERVER_ERROR--DESCRIPTION",
    "new": true,
    "timestamp": "2017-05-03T15:23:53.782Z",
    "value": "owijeoiwejfifwe.",
=======
      "Source/Content/Offline/InternalServerError.tsx"
    ],
    "id": "p7Le4KgRFed",
    "key": "INTERNAL_SERVER_ERROR--DESCRIPTION",
    "new": true,
    "timestamp": "2017-05-03T15:23:53.782Z",
    "value": "wefefwfwefwewfe.",
<<<<<<< 2i3hr98f2h
    "variables": []
  },

Because git is just chunking the unique code between the two commits, i.e. leaving out files and variables line. It makes it hard for us if we want to choose both of the chunks. Which happens quite often.

So we want git to be a bit smarter and maybe resolve the conflict automatically or just provide chunks that includes files and variables.

>>>>>> HEAD
  {
    "files": [
      "Source/Content/Offline/InternalServerError.tsx"
    ],
    "id": "p7Le4KgRFed",
    "key": "INTERNAL_SERVER_ERROR--DESCRIPTION",
    "new": true,
    "timestamp": "2017-05-03T15:23:53.782Z",
    "value": "owijeoiwejfifwe.",
    "variables": []
  },
=======
  {
    "files": [
      "Source/Content/Offline/InternalServerError.tsx"
    ],
    "id": "p7Le4KgRFed",
    "key": "INTERNAL_SERVER_ERROR--DESCRIPTION",
    "new": true,
    "timestamp": "2017-05-03T15:23:53.782Z",
    "value": "wefefwfwefwewfe.",
    "variables": []
  },
<<<<<<< 2i3hr98f2h
qborreda commented 7 years ago

Similar to this, the json generator writes new entries on top of each other, and git comes up with this mess when merging:

  {
    "files": [
      "src/localizations/dictionary.js"
    ],
<<<<<<< HEAD
    "id": "zeEXaBEzSX",
    "key": "GENERIC->NO_PERMISSION",
    "new": false,
    "timestamp": "2017-03-16T16:45:48.446Z",
    "value": "You don't have permissions to do this action.",
=======
    "id": "Royz94kns6",
    "key": "GENERIC->FILTER_BY",
    "new": false,
    "timestamp": "2017-05-04T09:38:05.490Z",
    "value": "Filter by",
>>>>>>> develop
    "variables": []
  },

and the process of resolving these conflicts is so painful when two different branches use translations heavily .. multiply for as many languages as you have (5 in our case) ..

tinganho commented 7 years ago

One solution is to create one file per localization string. Unless, we want every user of l10ns to add a proper merge driver for git to resolve these conflicts.

qborreda commented 7 years ago

Well, we have over 240 translation strings .. translated into 5 languages. I'm thinking we might as well work on a db and some interface that outputs the json format, and exclude the json files from the git repo ..

tinganho commented 7 years ago

Anyone running into this problem might mitigate it by applying diff3 style to your git config, to show common ancestor: https://github.com/Microsoft/TypeScript/issues/16129.

batiste commented 7 years ago

Wouldn't changing the format to use a map of keys solve this problem? It seems that internally the code is using a map anyway.

batiste commented 7 years ago

Here is a proposed PR https://github.com/tinganho/l10ns/pull/165

tinganho commented 7 years ago

I think it depends. If everything is one line yes(though, I don't think we want that).

{
    "UNIQUE_KEY_1": {
          "files": "same",
          "value": "unique-1",
          "variables": "same"
    },
    "UNIQUE_KEY_2": {
          "files": "same",
          "value": "unique-2",
          "variables": "same"
    },
    // ...
}

But, I think in the above case Git wants to collapse the "variables" line and the "}," line. Anyone that wants to just add both "UNIQUE_KEY_1" and "UNIQUE_KEY_2" to each store needs to re-add those lines to resolve the conflict.

We could actually have our own file format instead of using JSON and have unique lines in the start and end of each entry:

start "UNIQUE_KEY_1"
          "files": "same",
          "value": "unique-1",
          "variables": "same"
end "UNIQUE_KEY_1"
start "UNIQUE_KEY_2"
          "files": "same",
          "value": "unique-1",
          "variables": "same"
end "UNIQUE_KEY_2"

I think this will resolve the conflicts. But I need a POC first, to be sure.

batiste commented 7 years ago

@tinganho you might be right the map doesn't really solves the problem. I an unsure about a new file format.. That seems like an over engineered solution. I my experience those conflicts happen a lot because all the new stuff is inserted at the top of the file so this a recipe for conflicts. If the insertion order in the array would be random that would reduce the chance for conflicts enormously.

I there any documentation out there on how to split the translations in different files?

batiste commented 7 years ago

I believe a solution as simple as this would probably avoid the majority of conflicts:

https://github.com/batiste/l10ns/commit/5c201e840cba99d26d822fea98565ca7672a498f

If the key is random or pseudo random, then when the number of key increase the risk of conflict would decrease? Sorting by date might be the source of all those conflicts...

cadesalaberry commented 7 years ago

This might work pretty well with a custom git merger now that it is not and array anymore.

https://www.npmjs.com/package/git-json-merge

EDIT: NVM, I did not see I already mentionned it in this thread...

tinganho commented 7 years ago

@batiste I think you are right, sorting by date is the root cause of all the conflicts here. And if we sort by key, it would significantly reduce the conflicts. Though, there is still a slight chance of a conflict when it sorts each entry to be of the same row. Let me just think a bit about the implications. And I might add your commit.

@cadesalaberry I think I commented on custom git mergers before, but I think it would be a to daunting task to let every user of L10ns install it and configure git to use it. And L10ns should also be agnostic to which SCM you use. I haven't really used git-json-merge myself, but how does it deal with multiple type of files. Let say JS files and JSON files?

batiste commented 6 years ago

No news on this? We could really benefit from a change in our team...

tinganho commented 6 years ago

I will have closer look later today or tomorrow, and probably merge it in if I cannot find any issue.

cadesalaberry commented 6 years ago

@tinganho It supports reading js files, but it will always write the file as JSON. I understand your worry, It's true it would only work for JSON.

@SPARTED We ended up running this script after each translation and l10ns update. This is only temporary, but is so much better when it comes to merge conflicts.

./scripts/l10ns-inline.js


var path = require('path');
var fs = require('fs');

var l10nsConfig = require(path.resolve(__dirname, '..', 'l10ns')).projects.core;
var localizationsDir = path.resolve(__dirname, '..', l10nsConfig.store);

Object.keys(l10nsConfig.languages).forEach(processFile);

function processFile(lang) {
  var filepath = path.join(localizationsDir, lang + '.json');
  var json = JSON.parse(fs.readFileSync(filepath));

  try {
    fs.writeFileSync(filepath, processJSON(json));
  } catch (e) {
    console.error('Failed to process localization file ' + filepath);
    console.error(e.stack);
    process.exit(1);
  }
}

function processJSON(json) {
  var lines = json
    .sort(function(a, b) {
      var diff = new Date(b.timestamp) - new Date(a.timestamp);

      return diff || b.key.localeCompare(a.key);
    })
    .map(function(entry, i) {
      return [
        '  ',
        JSON.stringify(entry),
        (i !== json.length - 1 ? ',' : '')
      ].join('');
    });

  return ['[', lines.join('\n'), ']'].join('\n');
}
tinganho commented 6 years ago

I now published a new version of l10ns that sorts by key. Let me know if there is any problem. Thanks @cadesalaberry and @batiste to your prior research on this.

I'm still keeping this issue open. I'm a bit interested in introducing a new fileformat in the future to make this problem disappear for good.

tinganho commented 6 years ago

I've just tried to add unique lines on start and end of each entry. And it doesn't auto resolve the conflicts.

screen shot 2017-10-05 at 01 14 07

I think we have to resolve to @cadesalaberry's solution with a custom merge tool to git https://git-scm.com/docs/gitattributes#_defining_a_custom_merge_driver.

tinganho commented 6 years ago

Maybe the merge strategy union is a solution for this problem: https://git-scm.com/docs/gitattributes#gitattributes-union

cadesalaberry commented 6 years ago

Yes, I had a look at it as well, but it is not always consistent when merging.

When you have duplicate keys, l10ns update will only keep the first occurrence (if I remember well), which is not always right.

We are looking at a way to interact with git to get a smoother merge experience. I will get back to you once we get results.