olemb / dbfread

Read DBF Files with Python
MIT License
224 stars 91 forks source link

Better tests for field parsing #49

Open olemb opened 3 years ago

olemb commented 3 years ago

It's getting hard to keep track of all the special cases of field formats in different DBF files.

We need better tests for these special cases.

olemb commented 3 years ago

We also need documentation and example files for the different file variants of file formats encountered so far. It's getting increasingly difficult to add support for one file variant without risk of breaking support for another.

Some easier to use ways of overriding default behaviour would also be useful. Currently the only way is to subclass the field parser.

FilipStok commented 3 years ago

Hi, I was looking for a way to read a ModTime type field and I came across a description in Harbour's changelog: https://github.com/harbour/core/blob/master/ChangeLog.txt

    ; Current field type mappings are:
        C; Character,n     HB_FT_STRING,n                      ADS_STRING
        N; Numeric,n,d     HB_FT_LONG,n,d                      ADS_NUMERIC
        D; Date,n          HB_FT_DATE,3 or 4 or 8              ADS_COMPACTDATE; ADS_DATE
        ShortDate          HB_FT_DATE,3                        ADS_COMPACTDATE
        L; Logical         HB_FT_LOGICAL,1                     ADS_LOGICAL
        M; Memo,n          HB_FT_MEMO,4 or 9 or 8              ADS_MEMO
        B; Double,,d       HB_FT_DOUBLE,8,d                    ADS_DOUBLE
        I; Integer,n       HB_FT_INTEGER, 2 or 4 or 8          ADS_SHORTINT; ADS_INTEGER; ADS_LONGLONG
        ShortInt           HB_FT_INTEGER,2                     ADS_SHORTINT
        Longlong           HB_FT_INTEGER,8                     ADS_LONGLONG
        P; Image           HB_FT_IMAGE,9 or 10                 ADS_IMAGE
        W; Binary          HB_FT_BLOB,4 or 9 or 10             ADS_BINARY
        Y; Money           HB_FT_CURRENCY,8,4                  ADS_MONEY
        Z; CurDouble,,d    HB_FT_CURDOUBLE,8,d                 ADS_CURDOUBLE
        T,4; Time          HB_FT_TIME,4                        ADS_TIME
        @; T,8; TimeStamp  HB_FT_TIMESTAMP,8                   ADS_TIMESTAMP
        +; AutoInc         HB_FT_AUTOINC,4                     ADS_AUTOINC
        ^; RowVersion      HB_FT_ROWVER,8                      ADS_ROWVERSION
        =; ModTime         HB_FT_MODTIME,8                     ADS_MODTIME
        Raw,n              HB_FT_STRING,n (+HB_FF_BINARY)      ADS_RAW
        Q; VarChar,n       HB_FT_VARLENGTH,n                   ADS_VARCHAR; ADS_VARCHAR_FOX
        VarBinary,n        HB_FT_VARLENGTH,n (+HB_FF_BINARY)   ADS_VARBINARY_FOX; ADS_RAW
        CICharacter,n      HB_FT_STRING,n                      ADS_CISTRING  

Maybe this will be helpful in identifying the field types, Harbor developers encountered the same problem when trying to support different RDDs.

As I understand it, a ModTime field can be read like a TimeStamp field. Can you add mapping:

    # Modtime field ('=')
    parse3D = parseT