utterworks / fast-bert

Super easy library for BERT based NLP models
Apache License 2.0
1.85k stars 342 forks source link

feature request: allow other delimiters and encoding when reading csv files #243

Open woiza opened 4 years ago

woiza commented 4 years ago

Hi,

file: data_cls.py:

def get_train_examples( ... data_df = pd.read_csv(os.path.join(self.data_dir, filename) .. )

reads only csv files with "," as a separator and utf-8 encoding. Could you please make this configurable and allow other delimiters (";" and "|" are commonly used in Europe) and encoding? I tried to save my csv files delimited with "," but the code crashes at some other point since my data/csv cannot be parsed correctly.

I modified your function (hard coded): data_df = pd.read_csv(os.path.join(self.data_dir, filename), delimiter=';', encoding = 'utf-8') and now the model is training :-)