Closed Zsailer closed 2 years ago
Hey @Zsailer, could you elaborate a little more on this? A side effect of this is that it generates pretty obtuse error messages (which I'm working on fixing right now). What is exactly wrong with trying to coerce a string into a Pathlib object inside of EventSchema._load_schema()
if deserialization fails? This is still being performed by the CLI anyways in a try/except block. Do we only want to fallback to string=>Pathlib coercion within the CLI and not within the library? Seems very strange to me.
If we insist on keeping the current behavior and foregoing coercion within the library, then we should return a specific error message for when the user passes in a string path instead of a Pathlib object since I can see this being a common pain point. However, this will be difficult to do since it's difficult to distinguish between a path string and a serialized schema string. And there's no way of checking if the user "had intended to pass a path string" without trying to coerce it to a Pathlib object in the first place! The only workaround would be to add to the already lengthy generic error message passed to EventSchemaLoadingError
.
Let me give an example...
Imagine you're trying to read a string that looks a lot like a file path, but it wasn't intended to be:
# Trying to pass a schema string that looks like a filepath
schema_string = '/etc/passwd'
EventSchema(schema_string)
You'll see the contents of your password file (!!!) dump into your terminal inside of a raised exception.
Why?
The string was coerced into a pathlib object (maybe that was my intention, maybe it was not). The file content is technically valid YAML, so it's read. It fails JSON validation, and the validation error/exception prints the file contents to your terminal. I.e. you've accidentaly dumped sensitive information into your logs without intention—which is really bad news.
This is caused by a combination of issues with string coercion combined with (arguably) a bug in jsonschema where it dumps the contents of a file when an exception is raised.
This is a silly example, of course. schema_string='/etc/passwd'
is clearly not a schema string but a filepath—so it's clear what the user was intending. Other arbitrary strings that could also be interpreted as file names, however, might not be as clear to distinguish. It's best to make the developer explicitly pass this argument as a Path
object if they meant to use a path, to avoid possibly catastrophic vulnerability (e.g. dumping sensitive content) caused by the implicit coercion.
@Zsailer I see. So the idea is that we should never read and stream the contents of a file to standard output implicitly as a side effect of trying to load a schema string whose contents may be (maliciously) set to just a file path. We force the developer to explicitly coerce the string to a file path and specify to the EventLogger, "hey, this is a file path, read from it".
Fixes a bug discovered when working @yuvipanda. Interpreting a string as a filepath could be unsafe. Users could accidentally read an unintended file, fail validation and print the contents to stderr.