rtyley / bfg-repo-cleaner

Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
https://rtyley.github.io/bfg-repo-cleaner/
GNU General Public License v3.0
11.12k stars 549 forks source link

Regex: Did not work with newlines as expected #77

Open jkstrauss opened 9 years ago

jkstrauss commented 9 years ago

I tries cleaning with

regex:\r\n==>\n

However, instead of it replacing the CRLF with LF, I got literal n, and no newline at all.

rtyley commented 9 years ago

See this stackoverflow answer for a way to get the newline literal in : http://stackoverflow.com/a/15730571/438886

jkstrauss commented 9 years ago

There is no other way?

rtyley commented 9 years ago

None that spring to mind, other than writing custom code. Is there a reason why you don't want to use that solution?

jkstrauss commented 9 years ago

I seem to recall that there were some places where there were only \r without the \n. Also, doesn't Java allow for \n in the replacement also? Why doesn't it work?

jkstrauss commented 9 years ago

Actually, I do not seem to have anything that has this problem. I was just wondering why it would work this way.

rtyley commented 9 years ago

The replacement text in regex:\r\n==>\n isn't being treated as a Java language string literal (ie as though it was "\n" in a .java file), much as we'd like it to be. It's literally just the string: \n - a slash followed by a 'n'.

Some degree of interpretation is done, but only to the extent specified by Java's java.util.regexMatcher.appendReplacement() method:

Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.

jkstrauss commented 9 years ago

I do have an issue with certain places that have only \r without the \n at all. I guess I will have to resort to making no newlines at all in those locations. (It is not that big a deal because most of the places where ancient, and it did not occur to many times.)

tskarman commented 9 years ago

Without this I cannot convert certain file types from LF to CRLF. I'd like to do sth. like:

regex:(?<!\r)\n==>\r\n             # Replace Unix newlines with Windows newlines

(Intention: convert \n to \r\n, but do not break existing CRLFs (e.g. \r\n to \r\r\n))

Any tip on how to achieve that?

vlsi commented 5 years ago

For those who struggle with CRLF / LF processing, you might want to use my build of BFG: https://github.com/vlsi/bfg-repo-cleaner/releases/tag/v1.14.0-vlsi

I have implemented CRLF / LF normalization which works for me and does not require to have a single encoding across all files in the repository.

ymartin59 commented 4 years ago

@vlsi Thanks for this proposal. Would it be something possible to also "insert" a ".gitattributes" file in all commits in history, so that a checkout any existing branches and tags results in expected line endings?

vlsi commented 4 years ago

@ymartin59 , I guess it is an interesting idea, however, there might be an existing .gitattributes files in the repository, so there's a question of how the file should be "merged".

Of course, it should not be hard to add .gitattributes when absent.