stijnsanders / TMongoWire

Delphi MongoDB driver
MIT License
102 stars 37 forks source link

Design of BSondoc is flawed as each element is a variant and not explicitly typed #33

Closed SeanSolberg closed 7 years ago

SeanSolberg commented 7 years ago

When I have a JSON query to send to Mongo using MongoWire and the query is {field: "guid"} where guid is a string representation of a guid, the BSON objects are storing internally in the fElements array. That array holds the value as an OleVariant. So, when it goes to serialize over the wire, the code is looking at the string in Value and seeing that it looks like a GUID. So, it then serializes as binary data with a UUID subtype code (0x03). Problem is, that Mongo doesn't have UUID data in that field, but rather a string. It just so happenes to hold a string representation of a GUID, but the data is not a UUID. Instead of being implicit and trying to "automatically" recognized strings that are formatted as UUIDs and convert them to UUIDs, the fElements array should include an explicit type for each element. Then, we could use UUIDs when we want to and we could use strings when we want to without fear of "automatic" conversion that causes problems.

FElements:array of record SortIndex,LoadIndex:integer; Key:WideString; Value:OleVariant; end;

stijnsanders commented 7 years ago

I disagree. MongoDB is created around the concept of a schema-less database, based on versatile document structure at its base, much like JSON, they went with BSON for transport and internal storage. JavaScript and other non-strict typed languages also 'recognise' types as data is processed. The translation to and from OleVariants is not perfect, but I think I found a good balance. If you dislike the auto-conversion to/from UUID's, feel free to fork and remove it from your version.

SeanSolberg commented 7 years ago

Hey, thanks for commenting back. I really appreciate it. I agree with you on the concept of schema-less database. however, that doesn't mean you don't have indexes and queries that rely on certain pieces of data being of a certain, and consistent data type. What makes BSON so much better than JSON is the fact that each piece of data in BSON actually does have a declared, specific datatype. IE) it's not just text like JSON. I also agree with you on your comments about JavaScript. But really, TMongoWire wasn't written for JavaScript, it was written for Delphi. The programming principles in Delphi are completely different than those in JavaScript. I also agree that you did a good balance with your translations to/from OleVariants. However, as you pointed out, working in a variant world isn't perfect and thus you end up having buggy situations which could be eliminated by using explicit programming principles instead of implicit ones. JavaScript has so many problems, please don't follow javaScript patterns when writing Delphi code.

As always, I appreciate your opinions, and all in all, I think TMongoWire is great. I appreciate being able to use it. ..Sean

On Sat, Dec 10, 2016 at 5:56 PM, Stijn Sanders notifications@github.com wrote:

I disagree. MongoDB is created around the concept of a schema-less database, based on versatile document structure at its base, much like JSON, they went with BSON for transport and internal storage. JavaScript and other non-strict typed languages also 'recognise' types as data is processed. The translation to and from OleVariants is not perfect, but I think I found a good balance. If you dislike the auto-conversion to/from UUID's, feel free to fork and remove it from your version.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stijnsanders/TMongoWire/issues/33#issuecomment-266251058, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQ1HqcMfkpqeoCn3rMexufb52dS7Gnbks5rGzwSgaJpZM4KfJhC .

stijnsanders commented 7 years ago

In an attempt to have precisely this differentiation you describe for javascript-code and regular expressions, i use the bsonJavaScriptCodePrefix and bsonRegExPrefix constants that have specific unicode codepoints that usually don't occur in normal data. If you feel like it, you could create a fork that replaces the recognition of the {%.8x-%.4x-%.4x-%.4x-%.12x} format with a bsonGUIDPrefix constant value. Would that solve your problem?