henck / dBASE.NET

dBASE reader for .NET
http://www.independent-software.com
MIT License
65 stars 37 forks source link

observations #14

Open BufoViridis opened 4 years ago

BufoViridis commented 4 years ago

The following observations are based on CA-Clipper 5.2 Database Utility (xBase 3), Microsoft dBASE Driver (dBase 3, 4 and 5), Borland Database Engine (dBase 7 and xBase 3), Microsoft Visual FoxPro 9 (FoxPro 9) and Devart Universal Data Access Components source.

Fields FoxPro has its flags on byte 18 (0x01 bit means the field is system, 0x02 bit - nullable, 0x04 bit - binary and 0x08 bit - autoincrement). Byte 19 to 23 (uint32 in little endian) is the next autoincrement value and the 23 byte is the autoincrement step. You can easily add support for DBase 7 (see below).

Records The nullflags ('0' type) is a system field named _NullFlags and placed normally at the end when there is a nullable (0x02 bit flag), varchar ('V' type) or varbinary ('Q' type) field. Its a collection of bits that can't be indexed directly and can have size of more than 1. The order of the bits is from the least to the most significant bit following a stream byte order (first come, first serve). Each bit may represent null or in case of varchar and varbinary its size. If the field is a nullable varchar or nullable varbinary it will have 2 bits, the first one (less significant bit) will be the size bit and the second one (more significant bit) will be the null. If the null bit is 1 then the field is null. If the size bit is 1 then the varchar or varbinary field is not full and the size of the field is determinated by the last field byte (otherwise all bytes of the field are part of the data).

(c V(10) null, i I null, b Q(10) null) [_NullFlags 0(1)]
(null, null, null) 00011111
(null, 0, null) 00011011
('0', 1, null) 00011001
(null, 2, '0') 00001011
('0', null, '0') 00001101
(null, 4, '0123456789') 00000011
('0123456789', 5, null) 00011000
('0123456789', 6, '0123456789') 00000000
switch (column.Type) {
    case 'V':
    case 'Q':
        if ((null_flags[bit_index / 8] & (1 << (bit_index % 8))) != 0
        && field[field.Length - 1] < column.Size) {
            byte t = new byte[field[field.Length - 1]];
            Buffer.BlockCopy(field, 0, t, 0, t.Length);
            field = t;
        }
        bit_index++;
        if (bit_index / 8 >= null_flags_size) {
            return;
        }
        break;
}
if ((column.Flags & 2) != 0) {
    if ((null_flags[bit_index / 8] & (1 << (bit_index % 8))) != 0) {
        field = null;
    }
    bit_index++;
    if (bit_index / 8 >= null_flags_size) {
        return;
    }
}

Memo dBase 3 memos have always its block size set to 512. Each data block is terminated by 0x1a byte (usually 2). It's possible for a long data to be word wrapped (Clipper's DBU for example may add 0x8D and 0x0A bytes when wrapping lines for MS-DOS U.S.). dBase 4, 5 and 7 memos have variable block size, determinated by 4 to 8 byte (u32int in little endian) in the header block where 0 will mean 512 for DBase 4 and 5 and 1024 for DBase 7. Each data block will have own 8 byte header where the size will be in 4 to 8 byte (uint32 in little endian). The size includes the first 8 bytes. FoxPro memos have variable block size, determinated by 6 to 8 byte (u16int in big endian) in the header block where 0 will mean 64. Each data block will have own 8 byte header where the size will be in 4 to 8 byte (uint32 in big endian). The size does not include the first 8 bytes.

Types The currency field ('Y' type) in FoxPro is a int64 in little endian with implied 4 decimal digits.

long c = 0
c / 10000.0

The integer field ('I') and the autoincrement field ('+') in dBase 7 is a int32 in big endian where the most significant bit is a sign bit (1 means possitive).

uint i = 0
unchecked((int)(i > 0 ? i ^ 0x80000000 : 0))

The double field ('O') in dBase 7 is a double in big endian where the most significant bit is a sign bit (1 means possitive).

ulong d = 0
BitConverter.ToDouble(BitConverter.GetBytes(d > 0 ? (d & 0x8000000000000000) == 0 ? ~d : d ^ 0x8000000000000000 : 0), 0)

The timestamp field ('@') in dBase 7 is milliseconds since 0001-01-01 00:00:00 - 1 day as double in big endian.

double m = 86400000
new DateTime((long)((m - 86400000) * 10000))