pboettch / json-schema-validator

JSON schema validator for JSON for Modern C++
Other
466 stars 134 forks source link

Current master of json define binary type #108

Closed xamix closed 3 years ago

xamix commented 4 years ago

On the current master of json nlohmann a new type binary was added.

It fire the following warning in your validator:

/json-validator.cpp:1088:9: warning: enumeration value ‘binary’ not handled in switch [-Wswitch] 1088 | switch (type) { | ^

So the type is not handled by your validator. Do you think it will be hard to FIX that?

xamix commented 4 years ago

Or at least add handling of unknown type maybe by firing exception on validation when type is not known

pboettch commented 4 years ago

I think this will only be release in a 3.8-version of nlohmann-json? So, we need to put ifdefs around any usage of this type. The version of the develop branch is not yet set to 3.8 hence I cannot test any fixes. I still created a fix-108-branch: https://github.com/pboettch/json-schema-validator/tree/fix-108

xamix commented 4 years ago

I'have done similar FIX: But I have set a default in the switch which return nullptr.

This allow to not check the json version.

pboettch commented 4 years ago

Using default works but not using it will give a warning in the future as it did now. And this is expected, because maybe something needs to be done to handle the new type in other areas of the validator and this way it can be detected.

andrejlevkovitch commented 3 years ago

I think, it can be good idea to support binary types in this validator. Specification of json schema contains contentEncoding keyword. It can have value "binary", that means binary data in string. So schema for validation some binary data can looks like:

{
  "type": "object",
  "properties": {
    "binary_data": {
      "type": "string",
      "contentEncoding": "binary"
    }
  }
}

This means that in object we has field binary_data that has type string in binary encoding. Standard json string can't has binary data (limitation of json), so this means that we expect json::value_t::binary as input.

Implementation can looks like:

From 4f40b71eb9e1ba4ac43027f8e658ba06299392c0 Mon Sep 17 00:00:00 2001
From: andrejlevkovitch <andrejlevkovitch@gmail.com>
Date: Fri, 22 May 2020 10:27:51 +0300
Subject: [PATCH] add support for binary validation

---
 src/json-validator.cpp | 43 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/src/json-validator.cpp b/src/json-validator.cpp
index f16f3ef..65949ee 100644
--- a/src/json-validator.cpp
+++ b/src/json-validator.cpp
@@ -478,6 +478,7 @@ public:
            {"boolean", json::value_t::boolean},
            {"integer", json::value_t::number_integer},
            {"number", json::value_t::number_float},
+           {"binary", json::value_t::binary},
        };

        std::set<std::string> known_keywords;
@@ -491,17 +492,37 @@ public:

            case json::value_t::string: {
                auto schema_type = attr.value().get<std::string>();
+
+               // add supporting validation binary types
+               if (schema_type == "string") {
+                   auto found = sch.find("contentEncoding");
+                   if (found != sch.end() && found->get<std::string>() == "binary") {
+                       schema_type = "binary";
+                   }
+               }
+
                for (auto &t : schema_types)
                    if (t.first == schema_type)
                        type_[(uint8_t) t.second] = type_schema::make(sch, t.second, root, uris, known_keywords);
            } break;

            case json::value_t::array: // "type": ["type1", "type2"]
-               for (auto &schema_type : attr.value())
+           {
+               json type_array = attr.value();
+
+               auto has_string_type = std::find(type_array.begin(), type_array.end(), "string");
+               if (has_string_type != type_array.end()) {
+                   auto encodingFound = sch.find("contentEncoding");
+                   if (encodingFound != sch.end() && encodingFound.value() == "binary") {
+                       type_array.emplace_back("binary");
+                   }
+               }
+
+               for (auto &schema_type : type_array)
                    for (auto &t : schema_types)
                        if (t.first == schema_type)
                            type_[(uint8_t) t.second] = type_schema::make(sch, t.second, root, uris, known_keywords);
-               break;
+           } break;

            default:
                break;
@@ -1089,6 +1110,21 @@ public:
    }
 };

+/**\brief just a placeholder
+ */
+class binary : public schema
+{
+   void validate(const json::json_pointer &, const json &, json_patch &, error_handler &) const override
+   {
+   }
+
+public:
+   binary(json &, root_schema *root)
+       : schema(root)
+   {
+   }
+};
+
 std::shared_ptr<schema> type_schema::make(json &schema,
                                           json::value_t type,
                                           root_schema *root,
@@ -1115,6 +1151,9 @@ std::shared_ptr<schema> type_schema::make(json &schema,

    case json::value_t::discarded: // not a real type - silence please
        break;
+
+   case json::value_t::binary: // can use for validate bson or other binary representation of json
+       return std::make_shared<binary>(schema, root);
    }
    return nullptr;
 }
-- 
2.20.1
pboettch commented 3 years ago

Could you please add test-cases with your patch?

In regards to the implementation, maybe it would be better to add a callback similar to the format-check and if contentEncoding is encountered this callback is called.

Adding all possible encodings to the library is most likely impossible.

Also, I'm not sure whether contextEncoding=binary should be treated the same as the nlohmann::json value_t::binary

andrejlevkovitch commented 3 years ago

this is a patch with some test-cases (NOTE use latest nlohmann_json from develop branch):

From eac735ee63dd121e66b727b32adc1d9f2ab06c6d Mon Sep 17 00:00:00 2001
From: andrejlevkovitch <andrejlevkovitch@gmail.com>
Date: Fri, 22 May 2020 11:50:26 +0300
Subject: [PATCH 1/2] add support for validate binary data in json (for bson or
 other implementations of binary json)

---
 src/json-validator.cpp | 43 ++++++++++++++++++++++++++++++++++++++++--
 test/CMakeLists.txt    |  5 +++++
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/src/json-validator.cpp b/src/json-validator.cpp
index f16f3ef..65949ee 100644
--- a/src/json-validator.cpp
+++ b/src/json-validator.cpp
@@ -478,6 +478,7 @@ public:
            {"boolean", json::value_t::boolean},
            {"integer", json::value_t::number_integer},
            {"number", json::value_t::number_float},
+           {"binary", json::value_t::binary},
        };

        std::set<std::string> known_keywords;
@@ -491,17 +492,37 @@ public:

            case json::value_t::string: {
                auto schema_type = attr.value().get<std::string>();
+
+               // add supporting validation binary types
+               if (schema_type == "string") {
+                   auto found = sch.find("contentEncoding");
+                   if (found != sch.end() && found->get<std::string>() == "binary") {
+                       schema_type = "binary";
+                   }
+               }
+
                for (auto &t : schema_types)
                    if (t.first == schema_type)
                        type_[(uint8_t) t.second] = type_schema::make(sch, t.second, root, uris, known_keywords);
            } break;

            case json::value_t::array: // "type": ["type1", "type2"]
-               for (auto &schema_type : attr.value())
+           {
+               json type_array = attr.value();
+
+               auto has_string_type = std::find(type_array.begin(), type_array.end(), "string");
+               if (has_string_type != type_array.end()) {
+                   auto encodingFound = sch.find("contentEncoding");
+                   if (encodingFound != sch.end() && encodingFound.value() == "binary") {
+                       type_array.emplace_back("binary");
+                   }
+               }
+
+               for (auto &schema_type : type_array)
                    for (auto &t : schema_types)
                        if (t.first == schema_type)
                            type_[(uint8_t) t.second] = type_schema::make(sch, t.second, root, uris, known_keywords);
-               break;
+           } break;

            default:
                break;
@@ -1089,6 +1110,21 @@ public:
    }
 };

+/**\brief just a placeholder
+ */
+class binary : public schema
+{
+   void validate(const json::json_pointer &, const json &, json_patch &, error_handler &) const override
+   {
+   }
+
+public:
+   binary(json &, root_schema *root)
+       : schema(root)
+   {
+   }
+};
+
 std::shared_ptr<schema> type_schema::make(json &schema,
                                           json::value_t type,
                                           root_schema *root,
@@ -1115,6 +1151,9 @@ std::shared_ptr<schema> type_schema::make(json &schema,

    case json::value_t::discarded: // not a real type - silence please
        break;
+
+   case json::value_t::binary: // can use for validate bson or other binary representation of json
+       return std::make_shared<binary>(schema, root);
    }
    return nullptr;
 }
diff --git a/test/CMakeLists.txt b/test/CMakeLists.txt
index 8d67e90..f268219 100644
--- a/test/CMakeLists.txt
+++ b/test/CMakeLists.txt
@@ -55,3 +55,8 @@ add_executable(json-patch json-patch.cpp)
 target_include_directories(json-patch PRIVATE ${PROJECT_SOURCE_DIR}/src)
 target_link_libraries(json-patch nlohmann_json_schema_validator)
 add_test(NAME json-patch COMMAND json-patch)
+
+add_executable(binary-validation binary-validation.cpp)
+target_include_directories(binary-validation PRIVATE ${PROJECT_SOURCE_DIR}/src)
+target_link_libraries(binary-validation PRIVATE nlohmann_json_schema_validator)
+add_test(NAME binary-validation COMMAND binary-validation)
-- 
2.20.1

From 5ed4180798e268bd7f38d8bf23acec81c27ccfc9 Mon Sep 17 00:00:00 2001
From: andrejlevkovitch <andrejlevkovitch@gmail.com>
Date: Fri, 22 May 2020 11:53:24 +0300
Subject: [PATCH 2/2] add missing binary-validation.cpp file for test case

---
 test/binary-validation.cpp | 135 +++++++++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)
 create mode 100644 test/binary-validation.cpp

diff --git a/test/binary-validation.cpp b/test/binary-validation.cpp
new file mode 100644
index 0000000..29e707a
--- /dev/null
+++ b/test/binary-validation.cpp
@@ -0,0 +1,135 @@
+// bson-validate.cpp
+
+#include <iostream>
+#include <nlohmann/json-schema.hpp>
+#include <nlohmann/json.hpp>
+
+int error_count = 0;
+
+#define EXPECT_EQ(a, b)                                              \
+   do {                                                             \
+       if (a != b) {                                                \
+           std::cerr << "Failed: '" << a << "' != '" << b << "'\n"; \
+           error_count++;                                           \
+       }                                                            \
+   } while (0)
+
+using json = nlohmann::json;
+using validator = nlohmann::json_schema::json_validator;
+
+// check binary data validation
+const json bson_schema = json::parse(R"(
+{
+  "type": "object",
+  "properties": {
+    "standard_string": {
+      "type": "string"
+    },
+    "binary_data": {
+      "type": "string",
+      "contentEncoding": "binary"
+    }
+  },
+  "additionalProperties": false
+}
+)");
+
+const json array_of_types = json::parse(R"(
+{
+  "type": "object",
+  "properties": {
+    "something": {
+      "type": ["string", "number", "boolean"],
+      "contentEncoding": "binary"
+    }
+  }
+}
+)");
+
+const json array_of_types_without_binary = json::parse(R"(
+{
+  "type": "object",
+  "properties": {
+    "something": {
+      "type": ["string", "number", "boolean"]
+    }
+  }
+}
+)");
+
+class store_ptr_err_handler : public nlohmann::json_schema::basic_error_handler
+{
+   void error(const nlohmann::json::json_pointer &ptr, const json &, const std::string &message) override
+   {
+       nlohmann::json_schema::basic_error_handler::error(ptr, "", message);
+       std::cerr << "ERROR: '" << ptr << "' - '"
+                 << ""
+                 << "': " << message << "\n";
+       failed_pointers.push_back(ptr);
+   }
+
+public:
+   std::vector<nlohmann::json::json_pointer> failed_pointers;
+
+   void reset() override
+   {
+       nlohmann::json_schema::basic_error_handler::reset();
+       failed_pointers.clear();
+   }
+};
+
+int main()
+{
+   validator val;
+   val.set_root_schema(bson_schema);
+
+   // create some bson doc
+   json::binary_t arr;
+   std::string as_binary = "hello world";
+   std::copy(as_binary.begin(), as_binary.end(), std::back_inserter(arr));
+
+   json binary = json::binary(arr);
+
+   // all right
+   store_ptr_err_handler err{};
+   val.validate({{"standard_string", "some string"}, {"binary_data", binary}}, err);
+   EXPECT_EQ(err.failed_pointers.size(), 0);
+   err.reset();
+
+   // invalid binary data
+   val.validate({{"binary_data", "string, but expect binary data"}}, err);
+   EXPECT_EQ(err.failed_pointers.size(), 1);
+   EXPECT_EQ(err.failed_pointers[0].to_string(), "/binary_data");
+   err.reset();
+
+   // also check that simple string not accept binary data
+   val.validate({{"standard_string", binary}, {"binary_data", binary}}, err);
+   EXPECT_EQ(err.failed_pointers.size(), 1);
+   EXPECT_EQ(err.failed_pointers[0].to_string(), "/standard_string");
+   err.reset();
+
+   /////////////////////////////////////
+   // check with array of types
+
+   // check simple types
+   val.set_root_schema(array_of_types);
+   val.validate({{"something", "string"}}, err);
+   val.validate({{"something", 1}}, err);
+   val.validate({{"something", false}}, err);
+   EXPECT_EQ(err.failed_pointers.size(), 0);
+   err.reset();
+
+   // check binary data
+   val.validate({{"something", binary}}, err);
+   EXPECT_EQ(err.failed_pointers.size(), 0);
+   err.reset();
+
+   // and check that you can't set binary data if contentEncoding don't set
+   val.set_root_schema(array_of_types_without_binary);
+   val.validate({{"something", binary}}, err);
+   EXPECT_EQ(err.failed_pointers.size(), 1);
+   EXPECT_EQ(err.failed_pointers[0], "/something");
+   err.reset();
+
+   return error_count;
+}
-- 
2.20.1
pboettch commented 3 years ago

What do you think about the callback idea?

Could you not just simply make a pull-request? Much easier to review.

andrejlevkovitch commented 3 years ago
  1. this can be usefull, especially for handling other types of contentEncoding, but here is a problem with architecture... I don't know how implement this

  2. Of cause #114