larsga / Duke

Duke is a fast and flexible deduplication engine written in Java
Apache License 2.0
614 stars 194 forks source link

Please parse escape codes coming from XML (or document tab code for XML) #232

Open marco-brandizi opened 7 years ago

marco-brandizi commented 7 years ago

This is very relevant for the parameter 'separator' of the input data source CSV: '\t' is taken literally and an error like "the string \t is not a char" happens. It would help to unescape the string coming from XML attribute. Alternatively, please document how to represent tabs in XML (but this the sub-optimal solution).