Closed arnim closed 8 years ago
Hi Arnim,
the file that you mentioned contains only plain ascii characters. Since the encoding of a text file or java file cannot be stored in the file itself, I suspect that the file utility simple guesses the file encoding based on what it finds in the file. The encoding of the standard ascii characters is the same in us-ascii and UTF-8. So, the file utitlity cannot distinguish both encodings if a file contains only plain ascii.
Do you see encoding errors in the files or is it only the file utility that reports the wrong encoding?
Cheers, Christoph
Hi @cboehme
thanks for your fast reply ;) It sounds really trivial ... indeed ... but
The thing is that we run on various machines in a similar issue ..
Description of the actual problem: clean checkout > doing some very simple things > a lot of files occurring in the git diff
At first I was suspecting that something with the encoding setting of an IDE or the like might be funny. Yet, after we had this behavior on different machines and developers I got suspicious.
Why is such a minor thing even important? If git thinks that there are a lot of files changed we have a hard time in merging in new stuff you do.
@cboehme I'll ping back if I know more (hopefully next week;)
That could be a problem with the format of the line endings. We had problems with this, too. Git thought some files had changed but the diff show actual changes and there was also no way to revert the changes. Checking the files out again or resetting them had no effect.
I am not quite sure how we solved the in the end. I think, in the end we simply configured all tools to work with unix file endings.
core.autocrlf
which controls whether gits performs any crlf conversion. This can be either false, true or input. I set this to true
on windows and to false
on my Linux machine. I think if this is set to insert
, git attempts to correct files which have an incorrect line ending and then these files appear as changed. Since this is happening automatically, these files appear to be modified directly after checkout.core.autocrlf
. It relies on a file called .gitattributes
. We have added such a file to the repository a while ago which solved our problems on command line git. However, the jgit/egit do not yet process the file afaik. So there might still be problems in eclipse.I hope, this helps.
Sounds a lot like the thing we have ;)
"I think, in the end we simply configured all tools to work with unix file endings" Was suspecting something similar since we are mostly on windows.
I'll let you know if we get along with this ;) @cboehme awesome -> THX ;)
Hi there,
this might sound a little bit picky but ...
the pom.xml says this is a UTF-8 source encoded project https://github.com/culturegraph/metafacture-core/blob/master/pom.xml#L62
yet there seem to be a lot of files seem to be us-ascii
file --mime src/main/java/org/culturegraph/mf/stream/converter/StreamToTriples.java
What am I doing wrong?