Open KevinJBoyer opened 1 year ago
get_logging_details
is still pretty verbose, but I think the method needs to be very clear so folks don't unintentionally add PII.@KevinJBoyer very cool, thanks for putting this together!
Couple of initial reactions:
Thinking more about whether the attributes should be associated with the model, it seems actually that that might be too constraining. For example, suppose we have User
objects, and find the user.id
a useful thing to log. And suppose we're in an authorized representative flow, where one user is granting another user authorized representative status. Then it would be useful to log something like grantor.id
and grantee.id
, but if the attributes were tied to the model the logging system could only ever handle one object of each type.
The logger code would prefix the dictionary that's returned with the name you give it in the logging statement, so e.g., you'd be able to write:
logger.info(
"Granting another user authorized representative status.",
extra = {
"grantor": grantor, # grantor is of type User
"grantee": grantee, # grantee is also of type User
},
)
and get this out as the JSON:
{
grantor.id: ...,
grantor.other_fields: ...,
grantee.id ...,
grantee.other_fields ...,
}
I will take a look at the refactor!
Ah I see. A simple idea could also be to use the new add_extra_data_to_current_request_logs function in this PR to add fields like:
grantor: User
grantee: User
flask_logger.add_extra_data_to_current_request_logs(get_user_logging_details(grantor, prefix="grantor") | get_user_logging_details(grantee, prefix="grantee"))
logger.info(...)
That functionality may be orthogonal (though I think a huge step in the right direction) -- this enhancement is more about where the code in get_user_logging_details
functionality lives:
get_user_logging_details(user: User) -> Dict
-- what you have above, occurs in PFML at https://github.com/EOLWD/pfml/tree/main/api/massgov/pfml/util/loggingget_logging_details(self) -> Dict
-- the proposal here, which I think this is closer to an interface / composition (duck typed here in Python by adding the method, but you can imagine User implements Loggable
in Java or some other language) than it is inheritanceFrom a software architecture perspective, I don't want to add anything to the models themselves – that breaks the single responsibility principle and the open-closed principle and relies too heavily on inheritance. We'd want to put this in a separate module so that the SQLAlchemy models don't know about it
I'd be curious to read more or hear more about this -- my instinct is for the second approach (adding a method to the object, whether that object is a model or otherwise), because it forces developers into a consistent approach for how a particular class would be logged, has some nicer syntax, and has more observability for developers (the method exists as part of the definition of the class, so it's easy to see how the pattern would be applied to another one.)
BUT I can also see how "how this object should be logged" is distinct from the class definition of "what this object is and how it behaves"!
because it forces developers into a consistent approach for how a particular class would be logged
what would force the consistency? if i understand correctly, it'd force having a get_logging_details method, but since the implementation is spread out across all the models, every model could implement things quite differently
has more observability for developers (the method exists as part of the definition of the class, so it's easy to see how the pattern would be applied to another one.)
i think you meant discoverability rather than observability, but agree it'd be very discoverable but i think we could make it equally discoverable by having the logging functionality be a core part of the service object design rather than data object design. i think the purpose of logging is more closely related to service methods – i.e. we want to make our service methods more observable – rather than something that's related to the data design.
for example i think it conceptually could make more sense to have something like a ServiceMethod
interface that has a get_logging_data
method, which forces developers to think about what data should be logged to make a service more observable.
Ah yup discoverability is what I had in mind (which is my guess for why the PFML functions in util/logging
don't get used more in practice, at least in the parts of the codebase I'm familiar with.)
Totally agree on service vs. data, especially if there's a good way of enforcing that data models belong to a service (and not the other way around, where multiple services are all operating on the same data models -- this is what happens on PFML atm and leads to different services logging the same data model in different ways)
Edit: forgot to answer your Q on consistency -- that's a great point, I had in mind the signature of the function + where it lives, but there's nothing preventing each model from implementing the logic inside (and thus the outputs) differently!
Yeah I'm not entirely sure what the solution would look like. Could be worth sketching out some prototypes at some point.
Another idea is to annotate the data models with metadata associated with the data, such as whether fields are PII or not, which an objectlogger class could use to decide which fields to log. Not my favorite idea though since it feels like too much magic.
Taking a step back, an altogether different approach could be to enforce consistency at the check/test phase, through linters, tests, and/or static analysis. A system could run service methods and analyze the logs and check that they all follow a consistent format, and maybe do other checks. Not sure what it'd look like but just throwing out the idea.
Problem
On PFML, we have lots of approaches to logging complex objects like database models: https://github.com/EOLWD/pfml/tree/7cec2cb88c03dca73d2636211be689477df10f8d/api/massgov/pfml/util/logging + also various get_traceable_details functions throughout payments code
This causes general line noise in code: E.g.,
logger.info("some message", extra={"payment": get_traceable_details_for_payment(payment), "payment_details": get_traceable_details_for_payment_details(payment.payment_details)})
Inconsistent attribute names leading to difficulty parsing logs in New Relic: fineos_absence_id, absence_case_id, absence_id, etc.
Proposal
Add a standardized def
get_logging_details(self) -> Dict[str, str]:
to objects we want to logThis method should only return details of the object itself, not other related objects. (We don’t want an infinite loop where Payment.get_logging_details tries to include details about Payment.claim, and Claim.get_logging_details tries to include details about Claim.payments.) Let the author of the log statement decide which related objects need to be logged and which don’t.
E.g., on class Payment():
Update the logging formatters to detect if the objects passed to extra have implemented this method, and if they have, call it and add the generated dictionary back into extra, with each of the returned keys prefixed by f"{key}." of the original object in extra
Logging statements then can be written like
logger.info("Some message about two payments", extra={'payment_1': payment_1, 'payment_2': payment_2})
which produces a JSON output like:Proof of concept
https://github.com/EOLWD/pfml/compare/kboyer/exploratory-logging