mongodb / bson-rust

Encoding and decoding support for BSON in Rust
MIT License
400 stars 132 forks source link

RUST-1899 Deserialization with UUID #467

Closed Bryson14 closed 5 months ago

Bryson14 commented 6 months ago

Versions/Environment

  1. What version of Rust are you using? 1.77
  2. What operating system are you using? windows + wsl2
  3. What versions of the driver and its dependencies are you using? (Run cargo pkgid mongodb & cargo pkgid bson) 2.9.0 & 2.8.2
  4. What version of MongoDB are you using? (Check with the MongoDB shell using db.version()) 7.2.0
  5. What is your MongoDB topology (standalone, replica set, sharded cluster, serverless)? replica set

Describe the bug

A clear and concise description of what the bug is.

I am migrating data from another database which includes UUIDs stored as strings. I understand that bson::uuid stores data as binary base-64 data with a subtype flag = 4, however my current database is using strings and the rust code interacting with it is expected deserialization to and from strings.

My current struct definition:

#[derive(Serialize, Deserialize, Debug, PartialEq, PartialOrd, Clone)]
pub struct UserMetadata {
    /// A unique Single Sign On at GEHC
    pub sso: String,

    /// A unique identifier for the user.
    /// If not provided during deserialization, a random UUID will be generated.
    #[serde(default = "Uuid::now_v7")]
    pub user_id: Uuid,

    // The time the user was created.
    /// If not provided during deserialization, the current UTC time will be used.
    #[serde(with = "bson::serde_helpers::chrono_datetime_as_bson_datetime")]
    #[serde(default = "Utc::now")]
    pub last_access_time: DateTime<Utc>,

    // The time the user was created.
    /// If not provided during deserialization, the current UTC time will be used.
    #[serde(with = "bson::serde_helpers::chrono_datetime_as_bson_datetime")]
    #[serde(default = "Utc::now")]
    pub create_time: DateTime<Utc>,

    /// Team ID at GEHC
    #[serde(skip_serializing_if = "Option::is_none")]
    pub team_id: Option<String>,

    /// Organization ID at GEHC
    #[serde(skip_serializing_if = "Option::is_none")]
    pub org_id: Option<String>,

    /// User Type
    #[serde(skip_serializing_if = "Option::is_none")]
    pub user_type: Option<String>,

    /// Content for Content Privacy
    #[serde(skip_serializing_if = "Option::is_none")]
    #[serde(default = "default_deny")]
    pub content_privacy_consent: Option<PrivacyConsent>,

    #[serde(skip_serializing_if = "Option::is_none")]
    pub content_privacy_consent_date: Option<DateTime<Utc>>,
}

When I run find() to get items from this collection I get,

called `Result::unwrap()` on an `Err` value: Error { kind: BsonDeserialization(DeserializationError { 
message: "invalid type: string \"018d6b0c-4a48-76e0-8d61-eb7977071904\", expected bytes" }), labels: {}, wire_version: None, source: None }

I tried to remedy this by changed uuid::Uuid in the struct definition to bson::Uuid but got a similar error:

called `Result::unwrap()` on an `Err` value: Error { kind: BsonDeserialization(DeserializationError { message: "expected Binary with subtype Uuid, instead got 
String" }), labels: {}, wire_version: None, source: None }

Trying to use the serde_with-3 flag didn't help and I actually got compiler errors. This was after adding the feature flag with cargo add bson -F serde_with-3. The documentation didn't say that I had to add other dependancies to get this to work.

error[E0433]: failed to resolve: use of undeclared crate or module `serde_with_3`
  --> src\models\user_metadata_bson.rs:27:3
   |
27 | #[serde_with_3::serde_as]
   |   ^^^^^^^^^^^^ use of undeclared crate or module `serde_with_3`

error: cannot find attribute `serde_as` in this scope
  --> src\models\user_metadata_bson.rs:36:7
   |
36 |     #[serde_as(as = "Option<bson::Uuid>")]
   |       ^^^^^^^^

error: cannot find attribute `serde_as` in this scope
  --> src\models\user_metadata_bson.rs:81:7
   |
81 |     #[serde_as(as = "Option<bson::Uuid>")]
   |       ^^^^^^^^

what to do:

So I'm thinking there are two fixes to this. Either I fix this issue in code and get the serialization and deserialization to play well with uuid::Uuid, or I migrate all the fields that contain string uuid in the db to Mongo's UUID format. In that case, I'd appreciate any tools to do that, but might just have to bite the bullet and write a small python scipt to do that.

To Reproduce

  1. create a document in mongo with a field that contains a string
  2. attempt to find from that collection into a uuid::UUid

Cargo.toml

[dependencies]
aws-config = { version = "1.1.5", features = ["behavior-version-latest"]}
aws-sdk-s3 = "1.15.0"
aws-smithy-types = "1.1.5"
axum = "0.7.4"
axum-test = "14.2.2"
bson = { version = "2.9.0", features = ["chrono-0_4", "uuid-1", "serde_with-3"] }
chrono = { version = "0.4.34", features = ["serde"] }
futures = "0.3.30"
http = "1.0.0"
lambda_http = "0.9.3"
lambda_runtime = "0.9.2"
mongodb = {version = "2.8.2" , features = ["tokio-runtime"]}
serde = { version = "1.0.196", features = ["derive"] }
serde_json = "1.0"
thiserror = "1.0.56"
tokio = { version = "1", features = ["macros"] }
tower-http = { version = "0.5.0", features = ["trace", "add-extension"] }
tracing = { version = "0.1", features = ["log"] }
tracing-subscriber = { version = "0.3", default-features = false, features = ["fmt"] }
uuid = { version = "1.7", features = ["v4", "v7", "serde"] }
Bryson14 commented 6 months ago

example of data currently in mongodb

{
  "_id": {
    "$oid": "65fdf259bc497a1b87a5c9ce"
  },
  "sso": "123456789",
  "content_privacy_consent": null,
  "content_privacy_consent_date": null,
  "create_time": {
    "$date": "2024-02-11T02:29:05.302Z"
  },
  "last_access_time": {
    "$date": "2024-03-09T22:21:26.890Z"
  },
  "user_id": "018d95ff-2e96-7169-9d08-175b96d4e7e3"
}
isabelatkinson commented 6 months ago

Hi @Bryson14, thanks for this detailed report! We do indeed have a bug in our deserialization logic for UUIDs that are not in binary format. I made a PR to fix this (#468), so we'll put out a patch release once that's approved and merged. Once the fix is in, your UUID deserialization from strings when using the bson::Uuid type should work without needing to enable any feature flags.

I was unable to reproduce the compilation errors you're encountering. Are you seeing those errors with the chrono-0_4, uuid-1, and serde_with-3 feature flags enabled?

abr-egn commented 6 months ago

I think the serde_with error was fixed by https://github.com/mongodb/bson-rust/commit/81a98950899f96b138699e008e4d1703ed0f8fb5.

Bryson14 commented 6 months ago

I'll give the feature flags a try again. For the bson::uuid, it was too much time dealing with it that I just went with treating the data from the database as strings (which they are strings from the migration). I would have to change the rust code and database UUIDs to deal with this right now and I don't see any good benefit from using bson::uuid at the moment. Perhaps performance?

isabelatkinson commented 5 months ago

I don't see any good benefit from using bson::uuid at the moment

It depends upon your use case. If you're just reading your data via the driver then it likely doesn't matter much which type you're using, but bson::Uuid provides utilities for constructing values that could be useful if you're inserting new data.

I'm going to close this out since the fix in #468 has been released, but feel free to create another issue if you have any further feedback or questions!