laurieburchell / cs-lid-harder-than-you-think

Repository accompanying "Code-Switched Language Identification is Harder Than You Think" (EACL 2024))
Apache License 2.0
4 stars 0 forks source link

data preprocess suggestion #1

Closed kargaranamir closed 6 months ago

kargaranamir commented 8 months ago

Hi Laurie,

The code was super easy to use, thanks for that.

just some suggesstions.

I had troubles with this line of code: reformat-itu-tureng.py#L38

The error is: f-string expression part cannot include a backslash I solve it this way (not very clean 😄):

with open(args.out_file, 'w') as f:
    backslash_char = "\t"
    for s, t in zip(sents, tags):
        f.write(f"{s}{backslash_char}{'🍔'.join(t)}\n".replace("🍔", backslash_char))
laurieburchell commented 6 months ago

Hi Amir, Thanks for pointing out the bug! I put the join on a separate line, problem solved. I'll keep the emails in the script for now since this is how the authors ask people to obtain the data in the original papers.