jblondin / csv-sniffer

CSV sniffer crate for Rust
MIT License
7 stars 1 forks source link

Sniffer inferencing errors #14

Open jqnatividad opened 2 years ago

jqnatividad commented 2 years ago

Having several datetime fields in RFC3339 format:

testsumm.csv

csv-sniffer returns:

$ sniff testsumm.csv 
Metadata
========
Dialect:
        Delimiter: :
        Has header row?: true
        Number of preamble rows: 2
        Quote character: none
        Flexible: true

Number of fields: 9
Types:
        0: Text
        1: Unsigned
        2: Text
        3: Unsigned
        4: Text
        5: Unsigned
        6: Text
        7: Unsigned
        8: Text

The actual schema is as follows:

$ qsv stats --dates testsumm.csv | qsv table
field                           type      sum        min                       max                                          min_length  max_length  mean      stddev              variance
Unique Key                      Integer   305168410  30516469                  30517113                                     8           8           30516841  207.84321013695214  43198.80000003325
Created Date                    DateTime             2015-04-30 12:26:00 UTC   2015-05-01 00:42:17 UTC                      22          22                                        
Closed Date                     DateTime             2015-04-30 13:40:00 UTC   2015-07-06 13:24:38 UTC                      22          22                                        
Agency                          String               DEP                       HRA                                          3           3                                         
Agency Name                     String               Correspondence Unit       Senior Citizen Rent Increase Exemption Unit  19          43                                        
Complaint Type                  String               Benefit Card Replacement  SCRIE                                        5           28                                        
Descriptor                      String               Cash Assistance           Status of Payment Adjustment                 10          28                                        
Location Type                   String               N/A                       Sidewalk                                     0           18                                        
Incident Zip                    Integer   53216      10040                     11210                                        0           5           10643.2   478.3067634897086   228777.36000000004
Incident Address                String               444 JEROME AVENUE         444 JEROME AVENUE                            0           17                                        
Street Name                     String               JEROME AVENUE             JEROME AVENUE                                0           13                                        
Cross Street 1                  String               BROADWAY                  BROADWAY                                     0           8                                         
Cross Street 2                  String               W 230 ST                  W 230 ST                                     0           8                                         
Intersection Street 1           String               BROADWAY                  BROADWAY                                     0           8                                         
Intersection Street 2           String               W 230 ST                  W 230 ST                                     0           8                                         
Address Type                    String               ADDRESS                   INTERSECTION                                 0           12                                        
City                            String               BRONX                     STATEN ISLAND                                0           13                                        
Landmark                        NULL                                                                                        0           0                                         
Facility Type                   String               N/A                       N/A                                          3           3                                         
Status                          String               Closed                    Closed                                       6           6                                         
Due Date                        DateTime             2015-05-03 15:47:02 UTC   2015-08-09 13:14:13 UTC                      0           22                                        
Resolution Description          String               See notes.                We have mailed the requested document(s).    10          225                                       
Resolution Action Updated Date  DateTime             2015-04-30 13:40:00 UTC   2015-07-06 13:23:16 UTC                      0           22                                        
Community Board                 String               0 Unspecified             Unspecified BRONX                            8           17                                        
BBL                             NULL                                                                                        0           0                                         
Borough                         String               BRONX                     Unspecified                                  5           13                                        
X Coordinate (State Plane)      NULL                                                                                        0           0                                         
Y Coordinate (State Plane)      NULL                                                                                        0           0                                         
Open Data Channel Type          String               MOBILE                    PHONE                                        5           6                                         
Park Facility Name              String               Harry Chapin Park         Unspecified                                  11          17                                        
Park Borough                    String               BRONX                     Unspecified                                  5           13                                        
Vehicle Type                    NULL                                                                                        0           0                                         
Taxi Company Borough            NULL                                                                                        0           0                                         
Taxi Pick Up Location           NULL                                                                                        0           0                                         
Bridge Highway Name             NULL                                                                                        0           0                                         
Bridge Highway Direction        NULL                                                                                        0           0                                         
Road Ramp                       NULL                                                                                        0           0                                         
Bridge Highway Segment          NULL                                                                                        0           0                                         
Latitude                        NULL                                                                                        0           0                                         
Longitude                       NULL                                                                                        0           0                                         
Location                        NULL                                                                                        0           0                                         
created_year                    Integer   20150      2015                      2015                                         4           4           2015      0                   0

Removing all the date fields results in another error:

$ qsv select '!/Date$/' testsumm.csv > nodatefields.csv
$ sniff nodatefields.csv
ERROR: Sniffing failed: unable to find valid delimiter