iTrace-Dev / iTrace-Eclipse

Eclipse plugin to identify textual and interface elements based on iTrace Core gaze data
GNU General Public License v3.0
5 stars 2 forks source link

gaze2src sometimes produces an invalid TSV file (unescaped quote chars) #36

Closed ianbtr closed 4 years ago

ianbtr commented 4 years ago

Quote characters are not properly escaped. For instance, the following file might be produced (tabs are replaced with commas):

FIXATION_ID, X, Y, TOKEN/VIEW, SYNTATIC_CAT, XPATH, DURATION, RIGHT_PUPIL, LEFT_PUPIL 144, 121, 122, "Foo, ...

Because the quote is unescaped, the rest of the file is treated as a single entry by parsers, such as the Python CSV reader.

dtg3 commented 4 years ago

Adding escape characters to the output file for the python parser fundamentally changes the source token text when dealing with the string within Python. For example, requesting the first character from the string once parsed by the python CSV parser, the results are:

Input  => Output
\"foo  => \
""foo" => f
'"foo' => '

Alternatively, setting the python CSV parser to ignore quotes yields the correct character:

reader = csv.reader(tsvfile, delimiter='\t', quoting=csv.QUOTE_NONE)

Input  => Output
"foo   => "

If escape characters are still desired or other tools do not offer ignoring quotes, this could be transformed via a regex find replace or equivalent. The next version of the iTrace framework (releasing this summer) will not use the TSV output approach for post processing data and will instead use a Sqlite database.

Does this help to resolve your issue?

ianbtr commented 4 years ago

Yes, using quoting=csv.QUOTE_NONE fixed the problem.

Thanks!

dtg3 commented 4 years ago

Terrific! Happy to help.