Closed laurentS closed 4 years ago
Unfortunately you can't add action context from inside individual callbacks... You need to start actions outside, but then as you say DeferredContext is needed to track things.
Once Twisted has native support for contextvars (https://github.com/twisted/twisted/pull/1192), the use of eliot.twisted.DeferredContext
will in many cases become unnecessary. At that point any wrapped Twisted code should work better, you won't need to mess with the internals.
As a stopgap measure you can:
logging
package). Looks like you're already doing this.Not ideal, but something.
I guess looking at what you're doing in more detail, that might also be a reasonable stop gap, just: you can only call continue_task
once. Serialized task IDs are single use.
Thanks for the fast reply! This was super helpful. I've tried a few different options, and so far, my understanding is:
serialize_task_id()
is single use, but I can call it again after continue_task
it seems. Right now, I'm calling it at the end of each callback, so that the next one will log in sequence.Message.log()
and serialize_task_id()
matters. Looking at the code, they both call _nextTaskLevel()
to determine where the message appears in the log tree. I had not realised this at first.So it's not perfect, I still have weird non-chronological output in some places, and it requires carefully thinking about where the context manager calls and serialization calls happen, but the results is already way more readable than what I had only 3 hours ago :tada: Thanks for your help! (and the cool logging library)
First of all, thanks for the great library. I haven't gotten it to work yet, but I'm already impressed :)
I am writing a fairly complex scraper with scrapy that involves fetching files in a tree-like way: one first index file yields a number of other files to fetch, each of them having several sections to process independently. The processing is fairly complex, so I am struggling to track errors, and eliot looks like a super promising solution (my first attempt was very exciting to look at although it doesn't quite work yet).
In short, scrapy is built on top of twisted, but I obviously don't want to modify scrapy's code as described in the docs. To make things worse, the initial scrapy process uses generators everywhere, so keeping track of the current action is tricky.
scrapy
uses the concept of anitem
that it passes around between callbacks to transfer data. There is an initial scraping process which yields requests to which thisitem
is attached. Thenscrapy
passes thisitem
to a series ofpipelines
, each of which gets theitem
, modifies it and returns it.To keep the context in Eliot, I tried serializing the action
the_id = action.serialize_task_id()
and then picking up the context withwith Action.continue_task(task_id=the_id):
. It works partially. The first time Icontinue_task
, the logs look ok, but if I try to do it more than once, the logs look like:The code looks like (these are the standard scrapy callbacks):
Is this kind of pipeline logic supported by
continue_task
, or am I trying to use the wrong solution here? To make it clear, each pipeline.process_item() is called once per item, and I then run a loop on each subitem inside each pipeline. Ideally, I want the logs to reflect that tree structure, to ease tracing errors. Any ideas would be great!