Open thomas9911 opened 1 year ago
Using this dataset: https://media.githubusercontent.com/media/datablist/sample-csv-files/main/files/organizations/organizations-100000.csv
The sniffer finds the delimiter : while it is clearly ,.
:
,
Index,Organization Id,Name,Website,Country,Description,Founded,Industry,Number of employees 1,8cC6B5992C0309c,Acevedo LLC,https://www.donovan.com/,Holy See (Vatican City State),Multi-channeled bottom-line core,2019,Graphic Design / Web Design,7070 2,ec094061FeaF7Bc,Walls-Mcdonald,http://arias-willis.net/,Lithuania,Compatible encompassing groupware,2005,Utilities,8156 3,DAcC5dbc58946A7,Gregory PLC,http://www.lynch-hoover.net/,Tokelau,Multi-channeled intangible help-desk,2019,Leisure / Travel,6121 4,8Dd7beDa37FbeD0,"Byrd, Patterson and Knox",https://www.james-velez.net/,Netherlands,Pre-emptive national function,1982,Furniture,3494 5,a3b5c54AEC163e4,Mcdowell-Hopkins,http://fuentes.com/,Mayotte,Cloned bifurcated solution,2016,Online Publishing,36 6,fDfEBeFDaEb59Af,Hayden and Sons,https://www.shaw-mooney.info/,Belize,Persistent mobile task-force,1978,Insurance,7010 7,752ef90Eae1f7f5,Castro LLC,http://wilkinson.com/,Jamaica,Advanced value-added definition,2008,Outsourcing / Offshoring,2526
Code (similar to the example found in this repo):
extern crate csv_sniffer; use std::path::Path; use csv_sniffer::{SampleSize, Sniffer}; fn main() { let data_filepath = Path::new(file!()) .parent() .unwrap() .join("../data.csv"); let dialect = Sniffer::new() .sample_size(SampleSize::All) .sniff_path(data_filepath) .unwrap(); println!("{:#?}", dialect); }
output:
Metadata { dialect: Dialect { delimiter: ':', header: Header { has_header_row: true, num_preamble_rows: 1, }, quote: None, flexible: false, }, num_fields: 2, types: [ Text, Text, ], }
Is this a known issue?
ps: with another sample_size it also gives the wrong delimiter.
Using this dataset: https://media.githubusercontent.com/media/datablist/sample-csv-files/main/files/organizations/organizations-100000.csv
The sniffer finds the delimiter
:
while it is clearly,
.Code (similar to the example found in this repo):
output:
Is this a known issue?
ps: with another sample_size it also gives the wrong delimiter.