Closed makingglitches closed 3 years ago
I am wondering if he has not retired.
Ha! Sadly not retired, just busy with other things.
Could you create a pull request for these changes?
Ha! Sadly not retired, just busy with other things.
Could you create a pull request for these changes?
Sadly I'm not sure how to do that but no time like the present to learn I suppose
The changes are included in the source zip I added but I'll have a look see
Oh ok I'll see if the modded file is in my repo It should be
@chrisrichards do you have some unit tests for this ? Imma have to either just redo the code or do some merging because it looks like you made some updates that could cause conflicts in the files I fixed like half a year ago lol
@chrisrichards looks also like you updated to dotnet core v5 ? Am I right ? I must be alone in my mistrust of the "future of .net" heh
Yes, it is now net5.0 and netstandard2.1.
There are unit tests in the project already, but nothing specifically for integer precisions. If you have or can create a dbf file that can be used in a test that would be great!
@chrisrichards uh are you sure your code still works ?
All the dbf files I see contain string representations of the value which has to be converted to a string from the byte array to the value type.
The new code is skipping that and trying to draw a value from a byte array using bitconverter.todouble for example.
if that works or ms made it smart enough to recognize the difference between a packed binary array and a string that needs parsed then their docs are off
Oh note I';m just using dbfDouble as an example. Its reading the byte array wrong. Basically instead of using the recognized sectional stream processor type design, have the base class cut up the record and pump it through the column definitions its looking like instead.
Differences:
Present Code
public override void Read(ReadOnlySpan<byte> bytes)
{
#if NET48
Value = BitConverter.ToDouble(bytes.ToArray(), 0);
#else
Value = BitConverter.ToDouble(bytes);
#endif
}
Working Code
public override void Read(BinaryReader binaryReader)
{
if (binaryReader.PeekChar() == '\0')
{
binaryReader.ReadBytes(Length);
Value = null;
}
else
{
var stringValue = new string(binaryReader.ReadChars(Length));
if (long.TryParse(stringValue,
NumberStyles.Integer | NumberStyles.AllowLeadingWhite | NumberStyles.AllowTrailingWhite,
_intNumberFormat, out var value))
Value = value;
else
Value = null;
}
}
I tested the old code extensively against real data taken from Government distributed DBASE 4.0 files.
See DBF spec here http://dbase.com/Knowledgebase/INT/db7_file_fmt.htm
Numbers (N) - Number stored as a string, right justified, and padded with blanks to the width of the field. Float (F) - Number stored as a string, right justified, and padded with blanks to the width of the field. Long (l) - 4 bytes. Leftmost bit used to indicate sign, 0 negative. Double (O) - 8 bytes - no conversions, stored as a double.
i.e. you can't use DbfValueLong for numbers with a length > 10 as they are stored differently
If you can provide a test DBF file that would help.
Sure but you don't seem to be getting what I'm saying.
https://docs.microsoft.com/en-us/dotnet/api/system.bitconverter.touint64?view=net-5.0
BitConverter is taking an endian binary value and converting it to variable type. Its not a TryParse or Parse.
Thats what the dbfDouble and DbfLong are behaving like.
Is the original guy dead or something ? Because we keep doing this same crazy shit for no reason.
The failure in the original code was occurring because it was always defaulting to Int32 column types if no decimal points were encountered which was cause issues when a parse was called, which was trying to parse a number bigger than it could fit in the value type, so I found the largest number that could be represented by a string, which happens to start with a 1, and subtracted one digit of length as the cutoff point, so less than a 10 digit length and the column is int32, greater than, and its int64
That looks like a link to dbase 7.0 spec. That's what was causing me confusion when I was fixing this issue before it started treating everything like a packed endian value.
https://www.loc.gov/preservation/digital/formats/fdd/fdd000325.shtml
This so far as I have seen is what everyone uses. Making this support 7 without a version checker is useless.
Skip the knowledge base link, its not right LOL Hone on the dbase 4.0 our cheapo gov still uses lol
Google Earth you can draw some doublechecks back from somehow. But there is also LibreOffice Calc which properly reads the format.
Thats what this project was reading. And its a format that is very very widely used to accompany ESRI shapefiles.
The tl_2019_01_place
file works fine, see https://github.com/yellowfeather/DbfDataReader/blob/d5b1141675b2973be05fd5e1419c45ff530c14a2/test/DbfDataReader.Tests/EsriShapefileTests.cs#L7
Can you provide a file that shows this issue?
Hey what did I say when I said this was a waking nightmare
This is now fixed, but please read the conduct of conduct before making any more contributions.
Luckily I updated the code :) I believe we had a discussion about this the last time the world ended for no valid reason.
src.zip
Remember. Nukes go 3 places. One is North Korea. One is Colorado and the other is a shit town named Enola and Wyoming just about makes 4.