metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
72 stars 34 forks source link

src encoding #206

Closed arnim closed 8 years ago

arnim commented 9 years ago

Hi there,

this might sound a little bit picky but ...

the pom.xml says this is a UTF-8 source encoded project https://github.com/culturegraph/metafacture-core/blob/master/pom.xml#L62

yet there seem to be a lot of files seem to be us-ascii

file --mime src/main/java/org/culturegraph/mf/stream/converter/StreamToTriples.java

What am I doing wrong?

cboehme commented 9 years ago

Hi Arnim,

the file that you mentioned contains only plain ascii characters. Since the encoding of a text file or java file cannot be stored in the file itself, I suspect that the file utility simple guesses the file encoding based on what it finds in the file. The encoding of the standard ascii characters is the same in us-ascii and UTF-8. So, the file utitlity cannot distinguish both encodings if a file contains only plain ascii.

Do you see encoding errors in the files or is it only the file utility that reports the wrong encoding?

Cheers, Christoph

arnim commented 9 years ago

Hi @cboehme

thanks for your fast reply ;) It sounds really trivial ... indeed ... but

The thing is that we run on various machines in a similar issue ..

Description of the actual problem: clean checkout > doing some very simple things > a lot of files occurring in the git diff

At first I was suspecting that something with the encoding setting of an IDE or the like might be funny. Yet, after we had this behavior on different machines and developers I got suspicious.

Why is such a minor thing even important? If git thinks that there are a lot of files changed we have a hard time in merging in new stuff you do.

@cboehme I'll ping back if I know more (hopefully next week;)

cboehme commented 9 years ago

That could be a problem with the format of the line endings. We had problems with this, too. Git thought some files had changed but the diff show actual changes and there was also no way to revert the changes. Checking the files out again or resetting them had no effect.

I am not quite sure how we solved the in the end. I think, in the end we simply configured all tools to work with unix file endings.

I hope, this helps.

arnim commented 9 years ago

Sounds a lot like the thing we have ;)

"I think, in the end we simply configured all tools to work with unix file endings" Was suspecting something similar since we are mostly on windows.

I'll let you know if we get along with this ;) @cboehme awesome -> THX ;)