yellowfeather / DbfDataReader

DbfDataReader is a small fast .Net Core library for reading dBase, xBase, Clipper and FoxPro database files
MIT License
134 stars 61 forks source link

Integer Precisions not being interpreted create a value overflow #98

Closed makingglitches closed 3 years ago

makingglitches commented 3 years ago

Luckily I updated the code :) I believe we had a discussion about this the last time the world ended for no valid reason.

src.zip

Remember. Nukes go 3 places. One is North Korea. One is Colorado and the other is a shit town named Enola and Wyoming just about makes 4.

makingglitches commented 3 years ago

I am wondering if he has not retired.

chrisrichards commented 3 years ago

Ha! Sadly not retired, just busy with other things.

Could you create a pull request for these changes?

makingglitches commented 3 years ago

Ha! Sadly not retired, just busy with other things.

Could you create a pull request for these changes?

Sadly I'm not sure how to do that but no time like the present to learn I suppose

The changes are included in the source zip I added but I'll have a look see

makingglitches commented 3 years ago

Oh ok I'll see if the modded file is in my repo It should be

makingglitches commented 3 years ago

@chrisrichards do you have some unit tests for this ? Imma have to either just redo the code or do some merging because it looks like you made some updates that could cause conflicts in the files I fixed like half a year ago lol

makingglitches commented 3 years ago

@chrisrichards looks also like you updated to dotnet core v5 ? Am I right ? I must be alone in my mistrust of the "future of .net" heh

chrisrichards commented 3 years ago

Yes, it is now net5.0 and netstandard2.1.

There are unit tests in the project already, but nothing specifically for integer precisions. If you have or can create a dbf file that can be used in a test that would be great!

makingglitches commented 3 years ago

@chrisrichards uh are you sure your code still works ?

All the dbf files I see contain string representations of the value which has to be converted to a string from the byte array to the value type.

The new code is skipping that and trying to draw a value from a byte array using bitconverter.todouble for example.

if that works or ms made it smart enough to recognize the difference between a packed binary array and a string that needs parsed then their docs are off

Oh note I';m just using dbfDouble as an example. Its reading the byte array wrong. Basically instead of using the recognized sectional stream processor type design, have the base class cut up the record and pump it through the column definitions its looking like instead.

Differences:

Present Code

public override void Read(ReadOnlySpan<byte> bytes)
        {
#if NET48
            Value = BitConverter.ToDouble(bytes.ToArray(), 0);
#else
            Value = BitConverter.ToDouble(bytes);
#endif
        }

Working Code


  public override void Read(BinaryReader binaryReader)
        {
            if (binaryReader.PeekChar() == '\0')
            {
                binaryReader.ReadBytes(Length);
                Value = null;
            }
            else
            {
                var stringValue = new string(binaryReader.ReadChars(Length));

                if (long.TryParse(stringValue,
                    NumberStyles.Integer | NumberStyles.AllowLeadingWhite | NumberStyles.AllowTrailingWhite,
                    _intNumberFormat, out var value))
                    Value = value;
                else
                    Value = null;
            }
        }

I tested the old code extensively against real data taken from Government distributed DBASE 4.0 files.

chrisrichards commented 3 years ago

See DBF spec here http://dbase.com/Knowledgebase/INT/db7_file_fmt.htm

Numbers (N) - Number stored as a string, right justified, and padded with blanks to the width of the field. Float (F) - Number stored as a string, right justified, and padded with blanks to the width of the field. Long (l) - 4 bytes. Leftmost bit used to indicate sign, 0 negative. Double (O) - 8 bytes - no conversions, stored as a double.

i.e. you can't use DbfValueLong for numbers with a length > 10 as they are stored differently

If you can provide a test DBF file that would help.

makingglitches commented 3 years ago

Sure but you don't seem to be getting what I'm saying.

https://docs.microsoft.com/en-us/dotnet/api/system.bitconverter.touint64?view=net-5.0

BitConverter is taking an endian binary value and converting it to variable type. Its not a TryParse or Parse.

Thats what the dbfDouble and DbfLong are behaving like.

makingglitches commented 3 years ago

tl_2019_01_place.zip

Is the original guy dead or something ? Because we keep doing this same crazy shit for no reason.

makingglitches commented 3 years ago

The failure in the original code was occurring because it was always defaulting to Int32 column types if no decimal points were encountered which was cause issues when a parse was called, which was trying to parse a number bigger than it could fit in the value type, so I found the largest number that could be represented by a string, which happens to start with a 1, and subtracted one digit of length as the cutoff point, so less than a 10 digit length and the column is int32, greater than, and its int64

makingglitches commented 3 years ago

That looks like a link to dbase 7.0 spec. That's what was causing me confusion when I was fixing this issue before it started treating everything like a packed endian value.

https://www.loc.gov/preservation/digital/formats/fdd/fdd000325.shtml

This so far as I have seen is what everyone uses. Making this support 7 without a version checker is useless.

Skip the knowledge base link, its not right LOL Hone on the dbase 4.0 our cheapo gov still uses lol

Google Earth you can draw some doublechecks back from somehow. But there is also LibreOffice Calc which properly reads the format.

makingglitches commented 3 years ago

http://web.archive.org/web/20150323061445/http://ulisse.elettra.trieste.it/services/doc/dbase/DBFstruct.htm#T2.4

Thats what this project was reading. And its a format that is very very widely used to accompany ESRI shapefiles.

chrisrichards commented 3 years ago

The tl_2019_01_place file works fine, see https://github.com/yellowfeather/DbfDataReader/blob/d5b1141675b2973be05fd5e1419c45ff530c14a2/test/DbfDataReader.Tests/EsriShapefileTests.cs#L7

Can you provide a file that shows this issue?

makingglitches commented 3 years ago

https://dotnetfiddle.net/CBzHzs Dude.

makingglitches commented 3 years ago

Hey what did I say when I said this was a waking nightmare

chrisrichards commented 3 years ago

This is now fixed, but please read the conduct of conduct before making any more contributions.