uTasker / uTasker-Kinetis

uTasker V1.4.11 based open source version for Kinetis and STM32 parts
60 stars 35 forks source link

Inconsistent character encodings in source files #10

Closed alexhenrie closed 2 years ago

alexhenrie commented 2 years ago

The C and C++ files in this repository use 9 different character encodings, which breaks the flawfinder tool. (flawfinder expects all source files to have the same character encoding, preferably UTF-8.)

$ git ls-files | grep -E '\.(c|cpp|h)$' | xargs -n 1 uchardet | sort | uniq -c | sort -rn
    186 ISO-8859-2
     71 ASCII
      8 ISO-8859-1
      4 WINDOWS-1252
      4 UTF-8
      4 ISO-8859-3
      2 WINDOWS-1250
      2 IBM852
      1 ISO-8859-9

The files can be converted en masse to UTF-8 with the following commands:

sudo pip install cvt2utf
git ls-files | grep -E '\.(c|cpp|h)$' | xargs -n 1 sed -i $'s/\xA3/\\\\xA3/g'
git ls-files | grep -E '\.(c|cpp|h)$' | xargs -n 1 cvt2utf convert
uTasker commented 2 years ago

Hi Thanks for the information and the conversion solution to clean the sources. The fact is that the project is maintained in multiple IDEs and these often use their own coding settings, meaning that depending on which IDE was used for a certain change and save files may get saved with different character encodings. This quire rarely causes any actual difficulty (although foreign characters and symbols may get converted in some circumstances) and, as you have pointed out, can be cleaned up if needed. Regards Mark