Open conker84 opened 5 years ago
@conker84 are you sure it's not that the cypher export function should be escaping the U+0085 character in some way ? It seems like it's a valid 'new line' expression https://www.compart.com/en/unicode/U+0085
I thought about that and it's something that we can do as a workaround because I think that the correct behaviour is the one provided testParsingTwitterFileWithFileUtils
, so there is a method to correctly parse the string.
Hi, we have this issue in APOC: https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/1286 in particular this comment: https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/1286#issuecomment-530538305
So I tried to reproduce the problem and I exported the dataset into this file with the
apoc.export.cypher.all
procedure.But you can use this file that contains only the "bad" line that I extracted from the file above
And if I execute this:
I get this error:
So I created these two tests looking for an invalid line:
And while the
testParsingTwitterFileWithScanner
fails 7 times, thetestParsingTwitterFileWithFileUtils
works well.So I looked into the Scanner class into the readLine method, and I found that it uses this pattern to get a line
"\r\n|[\n\r\u2028\u2029\u0085]"
, and if I open the file with Sublime, and look for the first line that breaks after the last words reported we found0x85
which should be the u0085 used by the Scanner line pattern.I don't know if you under the hood use the
Scanner
class, but I hope that I provided enough info to understand where the problem is.