VCF should either have an option in the header to choose a specific string encoding (e.g: utf-8, latin-1, ascii) with a default option set, or it should be documented which encoding VCF should be in.
Not so long ago there was a discussion about this on the vcftools-spec mailing list. This proposal by Eugene Clark is likely to appear in the VCF specification:
In order to address the need to represent non-ASCII characters in INFO field values, VCF files are assumed to be encoded in UTF-8 unless a "##fileencoding=NNN" header is present. To support stream based processing of VCF files, this header must immediately follow the version header. Because US-ASCII is a subset of UTF-8, this should be fully backwards compatible.
Characters reserved as structure delimiters must be encoded using %NN when appearing in content. This would apply to ALL content fields (INFO values, metadata header descriptions, variant Ids, etc). The reserved characters are therefore: newline (\n), carriage return (\r), tab (\t), hash (#), greater than (>), less than (<), equals (=), semicolon (;), comma (,), percent sign (%).
VCF should either have an option in the header to choose a specific string encoding (e.g: utf-8, latin-1, ascii) with a default option set, or it should be documented which encoding VCF should be in.