springml / spark-sftp

Spark connector for SFTP
Apache License 2.0
100 stars 98 forks source link

How to specify the encoding of the source file #42

Closed wsy858 closed 5 years ago

wsy858 commented 5 years ago

When my csv file encoding is "gb312", the content is garbled after writing to hdfs, can you provide a way to set the file encoding format?

SamambaMan commented 5 years ago

I'm having the same issue, reading windows-1252 encoded files. That's my options:

spark.read.\
    format("com.springml.spark.sftp").\
    option("host", "---").\
    option("username", "---").\
    option("password", "---").\
    option("fileType", "csv").\
    option("delimiter", "|").\
    option("header", "true").\
    option("charset", "ISO-8859-1").\
    schema(SCHEMA).\
    load(arquivo)
samuel-pt commented 5 years ago

Currently the charset is not supported. If possible create a PR with the support