Correlation ID for records

aqeelat commented 1 year ago

Oftentimes, when a change is made, it is made across multiple models/instances.

For example: when an order is paid:

update the order model
create a transaction in the ledger

A nice way to track this is to assign a correlation id to the request. We're currently doing this using django-cid. We:

define a cid header. if the header doesn't exist in the request, create a random UUID.
return the cid in get_additional_data
override auditlog admin to display the cid in the changelist

It's really helpful and I would love to see this being officially supported. I'm willing to open a PR for this feature.

aqeelat commented 1 year ago

This is how we have it:

hramezani commented 1 year ago

Thanks @aqeelat for reporting this. Agree with you, It is a nice feature to have. I would like to have it simple I mean we can include the header in the request and then save it when we write the log. I think we can have a separate model field for Correlation ID. Also, I think we don't need to add a django-cid as a new dependency and we can generate the correlation_id by uuid. What do you think?

aqeelat commented 1 year ago

I agree that it should be a separate model field.One major factor we need to consider is the ability to customize the header name. In one project we use, we get the correlation id from cloudflare in a header caller cf-ray, and in another project we get it from datadog in a header caller trace-id.I don’t mind not adding django-cid as a dependency, as long as we do it in a way that’s not very similar. How about we include the package within this repo and then sync it every few months? (I don’t really like this idea though)Sent from my iPhoneOn Dec 12, 2022, at 11:05 AM, Hasan Ramezani @.***> wrote: Thanks @aqeelat for reporting this. Agree with you, It is a nice feature to have. I would like to have it simple I mean we can include the header in the request and then save it when we write the log. I think we can have a separate model field for Correlation ID. Also, I think we don't need to add a django-cid as a new dependency and we can generate the correlation_id by uuid. What do you think?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

hramezani commented 1 year ago

Can't we handle the custom header name by a setting like CORRELATION_HEADER_NAME? We can consider the default value for setting to None and if it has a value, we can check the correlation_id and process it.

I am not against adding django-cid as a dependency, but I think the custom header is a simple one and IMHO, it does not worth adding a lib just for this one.

@aqeelat Please let me know if I missed something.

aqeelat commented 1 year ago

The thing is, django-cid has more features that are not needed in this project but are helpful overall. For example, it has a context processor, a cursor wrapper, and a middleware.

Now that I'm thinking about it, it might be overkill and might complicate the feature (e.g. require settings that are not in the same namespace), but on the other hand, it will allow for better customizability.

Also, a few thoughts came to my mind as I was working on the PR:

should we make the feature turned on by default and in the same middleware?
I have some questions about the structure of the project. For example:
1. why do we have to delete the local after every request? Shouldn't it be automatically garbage collected by python?
2. why are we connecting and disconnecting the signal? Why use the @signal wrapper?
3. when would signal_duid != auditlog["signal_duid"] be true?

Thank you!

aleh-rymasheuski commented 1 year ago

why do we have to delete the local after every request? Shouldn't it be automatically garbage collected by python?

We delete the name inside the threadlocal. The threadlocal is a global name and will never be garbage collected because the module namespace has a reference to it (and the modules are referenced by the internals so won't get collected either).

why are we connecting and disconnecting the signal? Why use the @signal wrapper?

I believe we connect and disconnect the signals explicitly to (a) be able to connect in one unit and disconnect in another, and (b) to have better control over the state of signals. @hramezani, please chime in if you have a better explanation.

when would signal_duid != auditlog["signal_duid"] be true?

I can think of just one example: if a user enters set_actor context twice (when, for instance, they enter set_actor context while handling a request passed through the auditlog middleware), we want one of the two handlers to be ignored. I doubt though that this case still works as expected because set_actor will clear the threadlocal on exit, breaking the outer context. :)

As for having the correlation ID included in the audit log, I like this idea for a feature, I would use it.

aqeelat commented 1 year ago

@alieh-rymasheuski Correct me if I'm wrong, but doesn't Django spin a new thread per request? https://stackoverflow.com/questions/62599950/is-storing-data-in-thread-local-storage-in-a-django-application-safe-in-cases Also, since we're only supporting python >= 3.7, why not switch to ContextVar? (This could be a separate issue)

I digress.

I'll have a PR ready soon. It won't include tests yet but I'd like to discuss the structure with you and @hramezani.

hramezani commented 1 year ago

@aqeelat I think we can close this and open another issue for future enhancment

mengchen-yu commented 1 year ago

Hi! When do we plan on cutting the next version? I jumped the gun trying to use the correlation header with v2.2.2 only to realize it's not released yet.

jazzband / django-auditlog

Correlation ID for records #469