yanb514 / i24_database_api

BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Custom I-24 Database API package

Version: main

Date revised: 10/23/2022

Requirements

Latest feature

Transform documents from vehicle ID indexed to timestamp indexed (v2)

dbc.transform2(read_database_name=None, read_collection_name=None,
                  write_database_name="transformed_beta", write_collection_name=None)

Default transforms the current collection to time-indexed collection of the same name in "transformed_beta" database. The timestamp-indexed documents have the schema:

{
         {
            "_id":
            "timetamp": t1,
            "eb": {
                "traj1_id": [centerx, centery, l, w, dir, v],
                "traj2_id": [...],
                ...
                },
            "wb": {
                "traj3_id": [centerx, centery, l, w, dir, v],
                "traj4_id": [...],
                ...
                },
            },
        {
           "_id":
           "timetamp": t2,
           ...,
           ...
           },
    }   

Installation

With the desired python venv / conda env activated, use the following command in shell:

pip install git+https://github.com/yanb514/i24_database_api@<tag>

where <tag> is either a branch name (e.g. main), a tag name (e.g. v0.3), or the latest version (latest)

Then, establish a connection to client

default_param a dictionary read from a config file (template see test_param_template.config).

default_param = {
  "host": "<mongodb-host>",
  "port": 27017,
  "username": "<mongodb-username>",
  "password": "<mongodb-password>"
}
dbc = DBClient(**default_param)

Pass optional database_name and collection_name to connect to a specific database and/or collection:

dbc = DBClient(**default_param, database_name = <database_name>, collection_name = <collection_name>)

Either ways dbc.client is essentially a wrapper of pymongo.MongoClient object, and inherits all properties and functions of it.

List all collections (if database_name is specified)

dbc.list_collection_names(), or equivalently
dbc.db.list_collection_names()

Easily switch to another database:

newdb = dbc.client[<new_database_name>]
newdb.list_collection_names()

Connect to the last updated collection in a database:

dbc = DBClient(**default_param, database_name = <database_name>, latest_collection=True) # dbc.collection is now the latest collection
print(dbc.collection_name)

Drop (delete) a collection:

dbc.collection.drop(), or
dbc.db[<some_collection_name>].drop(), or access another db
dbc.client[<some_database>][<some_collection_name>].drop()

Reset a collection:

dbc.reset_collection()

Reset would empty the currect collection but still keep the reference dbc.collection to that emptied collection.

Bulk delete collections in current database (dbc.db) by:

dbc.delete_collection([list_of_cols_to_be_deleted])

Mark collections to be safe from deletion:

dbc.mark_safe([safe_collection_list])

Transform documents from vehicle ID indexed to timestamp indexed

Authors: Zi Nean Teoh and Lisa Liu. Details see https://github.com/yanb514/i24_database_api/blob/main/src/i24_database_api/README.md

dbc.transform(read_database_name=None, read_collection_name=None,
                  write_database_name="transformed", write_collection_name=None)

Default transforms the current collection to time-indexed collection of the same name in "transformed" database.

Other collection level operations (dbc.collection has to be specified):

Query a single document

dbc.find_one(index_name, value)

Query based on filter

This API follows pymongo implementation, a more abstracted version of pymongo's collection.find()

query_filter = {"_id": {"$in": fragment_ids}}
query_sort = [("last_timestamp", "ASC")])
dbc.read_query(query_filter, query_sort)

Iterative range query

The following code demonstrates the use of the iterative query based on a query parameter.

rri = dbc.read_query_range(range_parameter='last_timestamp', range_greater_equal=300, range_less_than=330, range_increment=None)
while True:
    try:
        print(next(rri)["ID"]) # access documents in rri one by one
    except StopIteration:
        print("END OF ITERATION")
        break

print("Using for-loop to read range")
for result in dbc.read_query_range(range_parameter='last_timestamp', range_greater_equal=300, range_less_than=330, range_increment=None):
    print(result["ID"])
print("END OF ITERATION")

produces

last timestamp: 304.17, starting_x: 32806.20, ID: 3600083.0
last timestamp: 306.00, starting_x: 32771.59, ID: 3600084.0
last timestamp: 310.90, starting_x: 32533.66, ID: 3600086.0
last timestamp: 312.73, starting_x: 32805.35, ID: 400088.0
last timestamp: 313.23, starting_x: 31897.72, ID: 3600087.0
last timestamp: 316.53, starting_x: 31594.89, ID: 3600088.0
last timestamp: 324.50, starting_x: 31166.60, ID: 3600089.0
last timestamp: 325.07, starting_x: 32076.31, ID: 400089.0
last timestamp: 328.93, starting_x: 30132.66, ID: 3600090.0

Create a collection

A collection with specified collection_name is automatically created upon instantiating the DBWriter object. If a schema file (in json) is given, the writer object adds validation rule to the collection based on the json file. Otherwise, it gives a warning "no schema provided", and proceeds without validation rule.

A collection can also be created after the DBClient object is instantiated, simply call

dbc.db.create_collection(collection_name = collection_name, schema = schema_file) # schema is optional

Concurrent insert with multithreading

When bulk write to database, this package offers the choice to do non-blocking (concurrent) insert:

col = dbc.collection

# insert a document of python dictionary format -> pass it as kwargs
doc1 = {
        "timestamp": [1.1,2.0,3.0],
        "first_timestamp": 1.0,
        "last_timestamp": 3.0,
        "x_position": [1.2]} 

dbc.write_one_trajectory(**doc1) 

# insert a document using keyword args directly (if collection_name is None, use the current collection dbc.collection)
dbc.write_one_trajectory(collection_name = "test_collection" , timestamp = [1.1,2.0,3.0],
                           first_timestamp = 1.0,
                           last_timestamp = 3.0,
                           x_position = [1.2])

As of v0.2, if a document violates the schema, it bypasses the validation check and throws a warning in the console.

Schema examples

"Reconciled trajectories" collection

{
    "$jsonSchema": {
        "bsonType": "object",
        "required": ["timestamp", "last_timestamp", "x_position"],
        "properties": {
            "configuration_id": {
                "bsonType": "int",
                "description": "A unique ID that identifies what configuration was run. It links to a metadata document that defines all the settings that were used system-wide to generate this trajectory fragment"
                },
            "coarse_vehicle_class": {
                "bsonType": "int",
                "description": "Vehicle class number"
                },

            "timestamp": {
                "bsonType": "array",
                "items": {
                    "bsonType": "double"
                    },
                "description": "Corrected timestamp. This timestamp may be corrected to reduce timestamp errors."
                },

            "road_segment_ids": {
                "bsonType": "array",
                "items": {
                    "bsonType": "int"
                    },
                "description": "Unique road segment ID. This differentiates the mainline from entrance ramps and exit ramps, which get distinct road segment IDs."
                },
            "x_position": {
                "bsonType": "array",
                "items": {
                    "bsonType": "double"
                    },
                "description": "Array of back-center x position along the road segment in feet. The  position x=0 occurs at the start of the road segment."
                },
            "y_position": {
                "bsonType": "array",
                "items": {
                    "bsonType": "double"
                    },
                "description": "array of back-center y position across the road segment in feet. y=0 is located at the left yellow line, i.e., the left-most edge of the left-most lane of travel in each direction."
                },

            "length": {
                "bsonType": "double",
                "description": "vehicle length in feet."
                },
            "width": {
                "bsonType": "array",
                "items": {
                    "bsonType": "double"
                    },
                "description": "vehicle width in feet"
                },
            "height": {
                "bsonType": "array",
                "items": {
                    "bsonType": "double"
                    },
                "description": "vehicle height in feet"
                },
            "direction": {
                "bsonType": "int",
                "description": "-1 if westbound, 1 if eastbound"
                }

            }
        }
    }

https://github.com/yanb514/i24_database_api/blob/main/test/config/reconciled_schema.json

In future versions

Additional future enhancements include:

User roles: More details: https://stackoverflow.com/questions/23943651/mongodb-admin-user-not-authorized https://www.codexpedia.com/devops/mongodb-authentication-setting/ https://www.mongodb.com/docs/manual/tutorial/manage-users-and-roles/