grahamcrowell / DelimitedLucene

Indexes delimited text files (csv,tsv) with Lucene
0 stars 0 forks source link

Delimiter inference test #9

Open grahamcrowell opened 6 years ago

grahamcrowell commented 6 years ago

Need to infer delimiter of a file. For each test case below need to

Test Cases

None (not a delimited file)

  1. json that is also valid csv with same number of commas on each line
  2. plain text no commas, pipes, or tabs
  3. non-text/binary file

only 1 delimiter

  1. comma
  2. pipe
  3. tab
  4. pipe delimited file with one data line with one too many (few) columns

2 or more delimiters

assume a pipe delimited file with

  1. 2 commas in a column name (but no commas in data lines)
  2. 2 commas in a column name and 1 comma in each of that columns data values
  3. 1 comma in 2 column names (but no commas in data lines)
  4. 1 comma in 2 column names and 2 commas in each of that columns data values