roskakori / cutplace

validate data stored in CSV, PRN, ODS or Excel files
http://cutplace.readthedocs.org/
GNU Lesser General Public License v3.0
18 stars 20 forks source link

Cutplace displaying only the first error in a broken row on validation #115

Open vishakhabhasin opened 7 years ago

vishakhabhasin commented 7 years ago

Running cutplace 0.8.8, I'm trying to undertake basic validation of a CSV file. Upon trying to validate the data, I see only the first error for a broken row the output.

Code

import os
import glob
import sys
import cutplace

file_cid = sys.argv[1]
file_to_validate = sys.argv[2]
cid = cutplace.Cid(file_cid)
errFlag = False
for row_or_error in cutplace.rows( cid, file_to_validate, on_error = 'yield'):
    if isinstance(row_or_error,Exception):
        errFlag = True
        if isinstance(row_or_error,cutplace.errors.CutplaceError):      
            print ('%s' % row_or_error)
        else:
            raise row_or_error
    else:
        pass
if errFlag == False:
        print "No error found"

Contents from CID file ,Property ,Value,,,, D,Format,Delimited,,,, D,Encoding,UTF-8,,,, D,Header,1,,,, D,Line delimiter,LF,,,, D,Item delimiter,",",,,, ,,,,,, ,,,,,, ,Name,Example,Empty,Length,Type,Rule F,customer_id,3798,,,Integer,0...99999 F,surname,Miller,,...60,, F,first_name,John,,...60,, F,date_of_birth,,,,DateTime,MM/DD/YYYY F,gender,male,,,Choice,"female, male" ,,,,,, ,,,,,, ,Description,Type,Rule,,, C,customer must be unique,IsUnique,customer_id,,,

Contents from Customer.csv customer_id,surname,first_name,born,gender 1,Beck,Tyler,11/15/1995, 2,Gibson,Martin,8/18/1969, 3,Hopkins,Chester,12/19/1982, 4,Lopez,,10/13/1930, 5,123,,8/10/1943,hhhh ,Martin,,9/27/1932,male ,Knight,,5/25/1977,female ,Rose,Tammy,1/12/2004,female ,Gutierrez,,5/18/2010,male ,Phillips,,11/9/1960,female

Errors displayed

  1. customers_error.csv (R2C5): cannot accept field 'gender': value must not be empty
  2. customers_error.csv (R3C5): cannot accept field 'gender': value must not be empty
  3. customers_error.csv (R4C5): cannot accept field 'gender': value must not be empty
  4. customers_error.csv (R5C3): cannot accept field 'first_name': value must not be empty
  5. customers_error.csv (R6C3): cannot accept field 'first_name': value must not be empty
  6. customers_error.csv (R7C1): cannot accept field 'customer_id': value must not be empty
  7. customers_error.csv (R8C1): cannot accept field 'customer_id': value must not be empty
  8. customers_error.csv (R9C1): cannot accept field 'customer_id': value must not be empty
  9. customers_error.csv (R10C1): cannot accept field 'customer_id': value must not be empty
  10. customers_error.csv (R11C1): cannot accept field 'customer_id': value must not be empty

Customer file has 15+ validation errors but only 10 errors are being displayed in the results and rest of the errors are missed out despite of using parameter on_error='yield'. Please suggest.

roskakori commented 7 years ago

(Cleaned up markdown to enable syntax highlighting of example code.)

sepira commented 4 years ago

Hi was there a solution for this? Appreciate the help on this. Thank you!