learntextvis / textkit

Command line tool for manipulating and analyzing text
MIT License
28 stars 6 forks source link

ensure correct string encoding #19

Closed iros closed 8 years ago

iros commented 8 years ago

Output like

hello,4
my,land,5

Should be

hello,4
my\,land,5
vlandham commented 8 years ago

this is an issue for count and will be a problem for POS tagger output.

I believe most CSV encodings have a slightly different solution for this, where double quotes are used explicitly.

hello,4
"my,land",5
vlandham commented 8 years ago

Here is ruby's:

require 'csv'
v = ["test,a",3]
v.to_csv
=> "\"test,a\",3\n"
vlandham commented 8 years ago

Here is python

import csv
v = ["test,a",3]
with open('some.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(v)

produces "test,a",3 - in some.csv - so its the same.

vlandham commented 8 years ago

I think we could just use the python csv.writer() in the count output and the POS output, and then use csv.reader in some of the package functions.

vlandham commented 8 years ago

i can work towards doing this for counts

vlandham commented 8 years ago

Addressed in #24