Improve find(); Mongo-like sort(); Include MongoDB in test

davidlatwe commented 6 years ago

Motive

To behave more like Mongo.

Changed

add $all
find in list
add pymongo and MongoDB server in test, for behavior comparing
some minor fix or improve
Mongo-like sorting, able to read multiple sort_specifiers, and sorting basic types of data

Still more to do, but I think it's time to merge, since this had made a big change. Thank you :)

davidlatwe commented 6 years ago

The dataset_sample/mixed_type.json in c8f8688 is an ugly sets of data, the reason of that was for testing the sort method to make sure it behave the same as Mongo when sorting all kind of data together.

davidlatwe commented 6 years ago

Here's how sort() usage will change after this merge.

# Before
collection.find().sort({"field": direction})

# After
collection.find().sort("field", direction)
# or
collection.find().sort("field.child-field", direction)
# or
collection.find().sort([
("field1.child-field", direction1),
("field2", direction2),
])

Here I add a full note to explain how sort() is working after this merge.

Intro

Using Python to reproduce MongoDB's document sorting behavior

Ground Rules

Direction Vars

ASCENDING = 1    # small -> big
DESCENDING = -1  # big -> small

Basic Data Types Order

[]         ASCENDING
None           |
Numeric        v
String
Dict
List 1d        ^
List Nd        |
Bool       DESCENDING

To decide which data is bigger or smaller, firstly sort data by type in the order above, then compare the value.

If the sorting result is equal, then sort by data found order.

MongoDb Manual - Bson Types Comparison Order

Composite Types

List (Array)

List type will use smallest (in ASCENDING) or largest (in DESCENDING) member to compare with other data, therefore, the array will down grade one dimension in sorting, which means, one dimensional array will sorting with other data which is same type as array's comparing member. If those smallest or largest members are equal, sort by data found order.

 ASCENDING : smallest member
DESCENDING : largest member

Dictionary (Object)

Dictionary will iter key-value to compare with other dict type doc, first by value's data type, then sorting by key string, lastly by value, moving to next key-value if sorting result is equal, sort by data found order if all key-value are equal.

valueType_1 -> key_1 -> value_1 -> ... -> valueType_N -> key_N -> value_N

Non-existent/Null Types

None (Null, NoMatch)

If value is None or [None], sorting each other by found order. If field is not found, treat as None.

`[]`

If value is an empty list [], treats as less than null.

How to

tuple sorting

# basic form for comparing
(section(int), (type(int), value(*)), index(int))

section(int)

For realising multi-sort, need to prevent the next sort action overriding the last sorting result, by making sections (groups).

A set of docs with the same section id means the key-value they have in this sorting action are the same.

(type(int), value(*))

In order to compare with variety of types of value at the same time, comparing a weight value of value type before value itself.

[] ( less then null )
```
# weight -1
(-1, [])
```
None, NoMatch
```
# weight 0
(0, None)
```

Numeric, String, Bool

# weight 1
(1, 3.1415)
# weight 2
(2, "Hello")
# weight 5
(5, True)

Dict

# weight 3
(3, (2, "color", "#FF00AA", ..., type-N(int), key-N(str), value-N(*), ...))

List

When comparing with docs :

# Looking for min/max member to compare with other docs
min([(1, 72), ..., (type-N(int), value-N(*)), ...]) # or max

When comparing with other List :

# weight 4, iter all member inside the List
(4, ((1, 72), ..., (type(int), value(*)), ...))

index(int)

This is not the doc _id.

If the doc's key-value are equal after sorting, will order by this iteration index.

This index will always ordered in ASCENDING.

davidlatwe commented 6 years ago

For commit 880134a

The reason for that is to reset TinyDB's _id, without doing this, the _id which made by TinyDB will still bumping after delete_many({}), and will make the doc found order inconsistent in every time you repeat delete_many({}) and find(), which is bad for testing.

davidlatwe commented 6 years ago

Sorry for the info booming, I think that's all what I need to add up in this merge. Please let me know if there are any issue.

schapman1974 / tinymongo