luads / php-xbase

A simple parser for *.dbf files using PHP
MIT License
184 stars 84 forks source link

Support Memos for Visual FoxPro #118

Open d4mation opened 2 years ago

d4mation commented 2 years ago

Changes were made specifically for Visual FoxPro (TableType::VISUAL_FOXPRO), but it may work for other Versions too.

There are some minor issues with this PR, but they would likely require some refactoring of the library to handle nicely.

I've described the lingering issues below


Firstly, Visual FoxPro only expects 4 bytes for Memo fields, but XBase\Header\Column\Validator\DBase\MemoValidator will force this to 10 bytes no matter how you define the Column.

https://github.com/luads/php-xbase/blob/e818e3526102a20b7d9c7e1b14feef686c973a5d/src/Header/Column/Validator/DBase/MemoValidator.php#L9-L24

As a result, the Record Byte Length for any Tables that include Memo Fields will end up being incorrect. This will also impact the offset of the Column within the Record for any Fields that come after a Memo.

Given the way that the Memo field validation is being handled "globally" I'm not sure if there is a particularly good way to handle a different byte length only for specific DBase versions.

Secondly, while the .FPT file is written, technically the .DBF file has no knowledge that it exists. The Table Flag will need to be set accordingly in the .DBF file's header if Memo fields exist.

I've corrected these two issues on my end by tweaking the .DBF file after initial creation with the following code:

// Example Columns. These Columns were used higher up in my code to create the Table similar to the examples in the README.
// README Example: https://github.com/luads/php-xbase/blob/e818e3526102a20b7d9c7e1b14feef686c973a5d/README.md?plain=1#L169-L198
$columns = array(
    array(
        "name" => "TESTCHAR",
        "type" => FieldType::CHAR,
        "length" => 10,
    ),
    array(
        "name" => "TESTMEMO",
        "type" => FieldType::MEMO,
        "length" => 4,
    ),
    array(
        "name" => "ANOTHERCHAR",
        "type" => FieldType::CHAR,
        "length" => 15,
    )
);

$memo_columns = array_filter( $columns, function( $column ) {
    return $column['type'] == FieldType::MEMO;
} );

if ( ! empty( $memo_columns ) ) {

    $file_stream = Stream::createFromFile( $path, 'rb+' );

    // The Table flag will need to be set so other applications will attempt to load the Memo file
    $file_stream->seek( 28 );
    $file_stream->writeUChar( 2 );

    // Most DBase files use a length of 10 bytes for Memo fields, but Visual FoxPro uses 4 bytes
    // We are going to subtract 6 bytes from the Record Length for each Memo field
    // https://web.archive.org/web/20210801135633/http://devzone.advantagedatabase.com/dz/webhelp/advantage9.0/server1/dbf_field_types_and_specifications.htm

    $memo_field_count = count( $memo_columns );

    // Record Length is stored in bytes 10-11
    $file_stream->seek( 10 );
    $saved_record_length = $file_stream->readUShort();

    // Write the corrected Record Length to the proper offset
    $file_stream->seek( 10 );
    $file_stream->writeUShort( $saved_record_length - ( $memo_field_count * 6 ) );

    // We need to patch the length and offset of each Column individually too

    $removed_bytes = 0;
    foreach ( $headers as $index => $column ) {

        // Columns start at byte 32 and are 32 bytes long themselves
        $column_header_offset = 32 * ( $index + 1 );

        if ( $removed_bytes > 0 ) {

            // The offset of the column in the record is stored in bytes 12-15 of the column
            $file_stream->seek( $column_header_offset + 12 );
            $old_record_column_offset = $file_stream->readUShort();

            // Write the offset with the removed bytes subtracted from it, effectively bumping it backward the correct amount
            $file_stream->seek( $column_header_offset + 12 );
            $file_stream->writeUShort( $old_record_column_offset - $removed_bytes );

        } 

        if ( $column['type'] == FieldType::MEMO ) {

            // Write the correct Column length, stored in byte 16
            $file_stream->seek( $column_header_offset + 16 );
            $file_stream->writeUChar( 4 );

            // 10 bytes minus 4 bytes is 6 bytes. Increment our "Removed bytes" to subtract from subsequent columns
            $removed_bytes += 6;

        }

    }

    $file_stream->close();

    // This helps to confirm that our adjusted Header worked by attempting to parse the Header
    $table_reader = new TableReader( $path );
    $table_reader->close();

}

Lastly, one thing that may need to be tweaked is that the Memo file is created based on whether the current Table version supports Memos rather than if any Memo Columns exist or not. Currently, every time the Table is saved (meaning every time a Record is added) it will attempt to create a Memo file even if one isn't needed. I'm not sure if there's a good way around this though, since I know the parser can be set to only load certain Columns. If the Columns were restricted like this, it may not see the Memo columns in order to make this decision.

d4mation commented 2 years ago

There are not enough test cases in your PR. Can you write some?

What tests specifically would you like to see? I'd be happy to write some, but without the refactoring necessary to be able to handle the Memo Field length, the Table Flag, and the issue I discovered in #117 I don't know how best to confirm that the file generation is occurring properly.