yortus / DBFFile

Read and write .dbf (dBase III and Visual FoxPro) files in Node.js
MIT License
103 stars 53 forks source link

readRecords will occasionally throw a RangeError #65

Closed TheArchitect4855 closed 2 years ago

TheArchitect4855 commented 2 years ago

I'm using this library to convert some old DBF data, and I'm running into an issue where readRecords(n) will throw RangeErrors. Currently I've just wrapped readRecords in a try/catch and only read one record at a time, which works, but naturally some data is lost.

Full stack trace:

RangeError [ERR_BUFFER_OUT_OF_BOUNDS]: Attempt to access memory outside buffer bounds
    at boundsError (internal/buffer.js:80:11)
    at Buffer.readInt32LE (internal/buffer.js:386:5)
    at int32At (/home/kurtis/Documents/anthology-converter/node_modules/dbffile/dist/dbf-file.js:264:72)
    at readRecordsFromDBF (/home/kurtis/Documents/anthology-converter/node_modules/dbffile/dist/dbf-file.js:345:35)
    at async Object.module.exports.convert (/home/kurtis/Documents/anthology-converter/src/converters/custconverter.js:31:17) {
  code: 'ERR_BUFFER_OUT_OF_BOUNDS'
}

Unfortunately I can't attach the files in question as they contain confidential/private data. However, I can share some metadata about the file:

Thanks in advance for the help.

lordrip commented 2 years ago

Hi, is this a VFP or a DBase file? If is the former, I've seen files damaged with weird weird endings that fails only while accessing a portion of it.

Just to confirm that this is not the case, could you write a script in VFP or DBase that iterates over each record and prints de information to the screen or to a text file?

On Thu, Mar 10, 2022, 21:03 Kurtis @.***> wrote:

I'm using this library to convert some old DBF data, and I'm running into an issue where readRecords(n) will throw RangeErrors. Currently I've just wrapped readRecords in a try/catch and only read one record at a time, which works, but naturally some data is lost.

Full stack trace:

RangeError [ERR_BUFFER_OUT_OF_BOUNDS]: Attempt to access memory outside buffer bounds at boundsError (internal/buffer.js:80:11) at Buffer.readInt32LE (internal/buffer.js:386:5) at int32At (/home/kurtis/Documents/anthology-converter/node_modules/dbffile/dist/dbf-file.js:264:72) at readRecordsFromDBF (/home/kurtis/Documents/anthology-converter/node_modules/dbffile/dist/dbf-file.js:345:35) at async Object.module.exports.convert (/home/kurtis/Documents/anthology-converter/src/converters/custconverter.js:31:17) { code: 'ERR_BUFFER_OUT_OF_BOUNDS'}

Unfortunately I can't attach the files in question as they contain confidential/private data. However, I can share some metadata about the file:

  • Version: 48
  • records: 23k

  • The file contains the Y (money) column type, which is currently unsupported

Thanks in advance for the help.

— Reply to this email directly, view it on GitHub https://github.com/yortus/DBFFile/issues/65, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD57M2QJ5BJRXVQJ5HPQVNTU7JISRANCNFSM5QNVEFYA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

TheArchitect4855 commented 2 years ago

Unfortunately I don't have the tooling (or knowledge) to really do that. All I'm working with is your JS library and a DBF viewer extension in VS code.

From inspecting the file header, I can tell that it's a VFP file, so the weird endings may be the issue - Is there a workaround or known fix for that? If not, I have no issue seeing if I can fix it myself, I just don't really know what the problem is.

lordrip commented 2 years ago

Is not my library 😅 I'm just another user that happens to use it from time to time.

Do you know if always fails the same records?

Maybe another way could be to clone the repository and make a unit test using your file as a fixture, this way you could set some breakpoints to see if the problem comes from a bad record or from an unsupported field.

On Thu, Mar 10, 2022, 21:43 Kurtis @.***> wrote:

Unfortunately I don't have the tooling (or knowledge) to really do that. All I'm working with is your JS library and a DBF viewer extension in VS code.

From inspecting the file header, I can tell that it's a VFP file, so the weird endings may be the issue - Is there a workaround or known fix for that? If not, I have no issue seeing if I can fix it myself, I just don't really know what the problem is.

— Reply to this email directly, view it on GitHub https://github.com/yortus/DBFFile/issues/65#issuecomment-1064485791, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD57M2XOCK2W4VJBP6JEE2LU7JNHVANCNFSM5QNVEFYA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

TheArchitect4855 commented 2 years ago

Ah, yeah, that's a good idea. I'll definitely give that a try! Thanks for the help.

darac-10 commented 2 years ago

I also had that problem. I found that parsing the memo field is not correct, in case the dbf file is of type vfp9 (0x30).

I solved the problem as follows:

diff --git a/src/dbf-file.ts b/src/dbf-file.ts
index 5eed28d..b9ddae2 100644
--- a/src/dbf-file.ts
+++ b/src/dbf-file.ts
@@ -376,12 +376,14 @@ async function readRecordsFromDBF(dbf: DBFFile, maxCount: number) {
                             break;

                         case 'M': // Memo
-                            while (len > 0 && buffer[offset] === 0x20) ++offset, --len;
-                            if (len === 0) { value = null; break; }
                             let blockIndex = dbf._version === 0x30
                                 ? int32At(offset, len)
                                 : parseInt(substrAt(offset, len, encoding));
                             offset += len;
+                            if(isNaN(blockIndex) || blockIndex===0){
+                                value = null;
+                                break;
+                            }

                             // If the memo file is missing and we get this far, we must be in 'loose' read mode.
                             // Skip reading the memo value and continue with the next field.

Explanation: The memo field in the dbf file represents the blockIndex in the memo file ( .dbt, .fpt ...) BlockIndex in a dbf file (0x30) is always 4 bytes long, encoded as a 32-bit integer, and the field length and offset must not be changed. If the memo field has no content, then blockIndex = 0

In other cases (<> 0x30), blockIndex is a 10 byte ASCII encoded decimal number, which is right-aligned and left-filled with spaces. The parseInt () function ignores the gaps on the left, so it is unnecessary to move the offset. If the memo field has no content, then the blockIndex is filled only with spaces, and the parseInt () function will return NaN

yortus commented 2 years ago

A PR would be welcome!

jhrncar commented 4 months ago

Reopening this to possibly notify smarter people than me who helped here last time, as I came into the same error as described in #87 . From what I read here, the problem i encounter might be a superset of this error, and will require fix in similar fashion, which I am working on now. Has anybody experienced RangeErrors like me even after this?