kite-sdk / kite

Kite SDK
http://kitesdk.org/docs/current/
Apache License 2.0
394 stars 265 forks source link

Logic to infer data type and locale specific number formats #478

Open sshikov opened 6 years ago

sshikov commented 6 years ago

private static final Pattern LONG = Pattern.compile("\d+"); private static final Pattern DOUBLE = Pattern.compile("\d\.\d[dD]?"); private static final Pattern FLOAT = Pattern.compile("\d\.\d[fF]?");

I suggest that different locale specific numbers formatting should also be supported. What do you think about custom formats recognizers for types like Dates, UUIDs etc?

Also, looks like only 1st line of CSV is used for type inference, not first 25 as expected. private static final int DEFAULT_INFER_LINES = 25;

I have a file there 1st row of data contains some column like "device model" with digits only, and in the 2nd row there are also letters. Schema inferred contains union type "null", "string", and import failed on the same 2nd row.