Closed twisted-trac closed 7 years ago
@wsanchez set owner to @wsanchez @wsanchez set status to assigned |
---|
Much conversation on this went on in email: http://thread.gmane.org/gmane.comp.python.twisted/25777
@glyph removed owner @glyph set status to new |
---|
Notes for the reviewer:
This branch contains a lot of code. It's more code than one can reasonably review all at once. However, breaking the branch up into multiple changes will result in changes that don't fully implement something, and are, even at the smaller code size, not really worth the effort of reviewing. Since it wouldn't fully implement something, it would be hard to get a holistic view of the intended effect (for example, many of the small implementation modules don't expose any public API and therefore have no narrative documentation).
So, my suggestion for reviewers is this:
Select one of:
and do a complete code review just on that piece. Feel free, of course, to "look aside" to examine how the bits of code that you're reviewing interact with the system as a whole, since that's the whole point of putting this into review all at once, but don't feel as though you need to comprehensively review anything other than the one thing you've selected.
As always, please separate your feedback into sections so that you make a distinction between things that you think must be addressed before landing, things that should be addressed before landing but could be addressed in a separate ticket, and things that you might want the change authors to consider.
For maximum accuracy on the high scores page, and so everybody gets credit on what is still a significant amount of review work, I suggest that each reviewer remove and then re-add the review keyword as they complete their review. If you're going to review a particular portion, say which one you're doing in a comment here first, so that others know.
I'm assuming that not all of these reviews are going to pass, and that there will be a brief second review portion. Once there's a code review in for each portion of the code, a subsequent reviewer can do the whole thing, just making sure that all of the feedback from the original reviews is dealt with, rather than trying to absorb the whole change.
@ralphm set owner to @glyph @ralphm set status to new |
---|
FIXME
comment (_json.py
and test_logger.py
).dateutil
instead of 'adding' _tzhelper
?t.p.test.test_tzhelper.py
:
tzset
should use it.addTZCleanup
's docstring._flatten
mentions test_flatten
, which doesn't exist. However, test_flatten.FlatFormattingTests
tests all of the _flatten
module, minus the above mentioned edge cases._flatten.py
is not in setup3.py
.twisted.python.reflect
, twisted.internet.test.test_unix
, twisted.test.test_twistd
.twisted.test.test_log
imports io
halfway the module.This concludes my review. Overview:
STDLibLogObserver
are nice-to-have, except for the thread-safeness.Thanks!
@ralphm set owner to @glyph @ralphm set status to new |
---|
This is the first phase on my review on this branch, covering part 3 (public interface and documentation), per comment:36.
log_source
is confusing. In ad_hoc.py
, I expected self
and didn't immediately understood where log_source
was coming from. Maybe it would help to have an explicit table listing all of the fields that might be added by the system.dict
s as values is going to be a common use case, so maybe this needs to spelled out more. Especially if there will be observers that queue up events to be formatted and rendered some time in the (near?) future.Logger.emit
isn't mentioned in the howto. I think it worth mentioning that it exists and that you can pass it the log level as the first argument.exc_info
to .info
, .error
, etc., or have it be extracted from sys.exc_info
by passing True
instead. I appreciate the explicitness of Logger.failure
. The assumption that people want this logged at level critical
doesn't match my experiences. I'm on the fence which of the two alternatives is a nicer API.I also briefly looked at the bridge to the Python Logging Facility from the standard library (STDLibLogObserver
) and think it is good start, but a bit lacking, like with the old logging system. A few comments:
LogRecord
and calling logging.Logger.handle
directly. This would also remove the need to explicitly overriding findCaller
.log_namespace
could be used as the logger name..args
. You can't really use extra
because those are merged into LogRecord.__dict__
. Yeah.Failure
s into the exc_info
used in LogRecord
.callInThread
for Logger.log?As a side note and not really part of the review, distinguishing conceptually different events, and the types of the values in fields, is important when sending log events to systems that analyze and/or store them.
For example, I intend to send logs (indirectly) via Logstash to Elasticsearch. The way Elasticsearch works is that a JSON structure gets interpreted as (nested) fields with (arrays of) values. Those fields are strongly typed using a so-called mapping. This mapping can be dynamic, depending on the very first value received for a particular field. In this case, if the first value is a number, all future events with a string value for that field will be rejected. However, a mapping is tied to a document type, with each type represented by a unique identifier (for that index). This means that you can have documents of different types, each with their own field mapping, in the same index.
It would be great if this was easy to do with the new logging system. Possible solutions would be an application specific field (like category
) that is used for the document type. Another would be to use the format string and maybe hash that to arrive at a unique identifier to use for distinguishing document types. Finally, systems like Eliot have a way make the typing of log events explicit. My feeling is that this logger doesn't need any adjustments to accommodate, but I wanted to put these observations out there.
@ralphm set owner to @wsanchez |
---|
Woah, wasn't that a long week?
I have some final comments:
Logger.__get__
should say its instead of it's.LoggerTests.test_trace
doesn't have full coverage. Both nested functions are never called, which means the test is a no-op. The cause is that it sets publisher
on the logger, instead of observer
. Changing that makes the test pass and have full coverage again. The wrapper function named publisher
can be dropped.LoggerTests.test_conflicting_kwargs
has a FIXME comment.test_interface
methods: why the try/except around verifyObject
?LogBeginner.showwarning
: the case where file is not None
and the Python version >= 2.6 is not tested. The implementation is correct, though.These all seem minor to address. When having done so, please merge!
Booyah!
@hawkowl set status to reopened |
---|
(In [43431]) Revert r42760: Revert [#6750](https://github.com/twisted/twisted/issues/6750) and related tickets.
Please see http://twistedmatrix.com/pipermail/twisted-python/2014-November/028912.html.
Reopens: #6750
@wsanchez set status to closed |
---|
(In [44442]) Merge branches/log-booyah-6750-6: New logging system
Author: wsanchez, glyph Reviewer: ralphm Fixes: #6750
@itamarst set status to reopened |
---|
(In [42759]) Revert r42758, r42757, and r42756 (ideally in a history-fixing way)
The merge of the log-booyah-6750-4 branch was broken up across multiple revisions in an extremely confusing (to humans and in metadata) way. Clear out those changes so it can be merged normally.
Reopens: #6750
@wsanchez set status to closed |
---|
(In [42756]) Merge log-booyah-6750-4: Brief description
Author: wsanchez, glyph Reviewer: ralphm Fixes: #6750
New logging system.
@wsanchez set status to reopened |
---|
(In [42757]) Revert r42756: Add feature file and nothing else, whoops.
Committed only the topfile.
Reopens: #6750
@wsanchez set status to closed |
---|
(In [42758]) Merge log-booyah-6750-4: New logging system.
Author: wsanchez, glyph Reviewer: ralphm Fixes: #6750
New logging system: twisted.python.logger
@exarkun set status to closed |
---|
(In [42760]) Merge log-booyah-6750-4: New logging system.
Author: wsanchez, glyph Reviewer: ralphm Fixes: #6750
New logging system: twisted.python.logger
@wsanchez commented |
---|
r43432 brings back the code that was on trunk. The following regressions resulted afterwards, resulting in a revert:
Logger.failure()
message text?@wsanchez commented |
---|
r43434 and r43435 merge the previously reviewed fixes for #7545 and #7548 respectively, which had also been on trunk.
That puts this branch on the same footing as trunk prior to the revert in r43431.
Buildbot is green except for some twistedchecker and pydoctor errors. r43436 cleans up three of the twisted checker errors. The rest are bogus: "too many blank lines" where there aren't, in fact, too many and pydoctor doesn't seem to know about NoneType
.
So r43436 is our starting point for continuing from already-reviewed code.
@wsanchez commented |
---|
Merged forward to branches/log-booyah-6750-6
Buildbot found some new twistedchecker items, fixed in r44425.
@wsanchez commented |
---|
Committed on branch as twisted.python.logging
, tests pass.
Next step is to implement twisted.python.log
in terms of twisted.python.logging
. That’ll have to include implementing some additional features, like the file log observer, in the new module’s terms; right now, the new module uses the old module for observers.
@wsanchez commented |
---|
(In [40036]) Branch for #6750
@jml commented |
---|
Does structlog have the attributes described in the OP?
@glyph commented |
---|
Replying to jml:
Does structlog have the attributes described in the OP?
Some of them, but not all of them. It also isn't attached as a patch to a Twisted ticket, so it's not really relevant here – but I know that it has generated a lot of excitement so perhaps it's best to have a brief comparison :-).
One of the major themes of my original post was making work-deferral easier, so that logging is intuitively cheap and we will generally do more logging within Twisted. Structlog doesn't appear to do anything more than our existing logging API to defer or skip computation, except for the fact that it provides a filtering API. Sadly, structlog also makes the curious decision to implement filtering by requiring that the filter raise an exception for each dropped message, which (as primitive operations go, anyway) is quite expensive.
The branch on this ticket addresses work deferral with idiomatic extensions to PEP3101. One benefit of this extended syntax (which allows for the suffix of ()
to invoke a callable, in addition to the standard PEP3101 getattr
(.
) and getitem
([]
) syntax) is that it allows for you to specify a reduced set of potentially non-serializable objects in your event, and defer not simply serializing the serializable aspects of the event, but even the decision to retrieve said data. For example, you can do this:
#!py
from twisted.python.logger import Logger
class MyInterestingObject(object):
log = Logger()
def __init__(self, socket, value):
self.socket = socket # Not serializable!
self.value = value
def doSomethingInteresting(self):
self.log.info("{source} doing something interesting "
"to {source.value.computation()}.")
Presumably, source.value.computation()
is expensive enough (i.e. it's a python function call so it's not completely free) that we don't want to do it unless someone actually wants to see this info message. But what's worth noting is that in this example, we don't even have to build a dictionary with a second key; until format-time, we just have an event dictionary with the built-in source
key (and the descriptor on Logger
takes care of providing self
as the source
key, so we could optimize that even further).
So, the format string isn't purely about the english text; it's a mini-language that lets you specify which parts of your log messages might be interesting to subsequent observers. In a live system, pre-serialization, it's helpful for log event objects to contain references to things like their emitter. This is useful for incident-reporting tools (like the one in Foolscap, described in some mailing list messages many aeons ago), which may or may not want to inspect those objects.
I guess you could accomplish this with structlog by building a custom processor for a particular module and then interposing it, but that requires careful consideration and application-specific observer tuning, both of which are possible with the existing logging system.
@exarkun commented |
---|
I wonder how the logging use demonstrated by
self.log.info("{source} doing something interesting "
"to {source.value.computation()}.")
will interact with structured log analysis tools. At first glance, the use of expressions like "source.value.computation()" inside the log message template and the explicit avoidance of constructing an extra dictionary with keys (thus, with keys into which any thought or planning has gone) means that apart from parsing the prose English version of this log message, there is no reliable way to write a tool that examines logs and treats the result of that computation semantically.
I'm sure this branch will come with copious documentation (and I admit I haven't read any of the code checked in so far (sorry)) so feel free to just point me at that for the answer.
Thanks for your work in this area!
@wsanchez commented |
---|
exarkun: That is true if source.value.computation()
returns some random thing every time. But if emits, for example, a username, then this is totally parseable by a tool. The point is that the fetching username might be expensive for some reason, and best avoided if the log message will be dropped. Perhaps a clearer example:
self.log.debug("{log_source.user()} wrote data to {log_source.destination()}")
Also: this all assumes that you are looking at output from an observer that renders the format into text, a la FileLogObserver
, of course.
glyph: All event keys added by the log system have a log_
prefix, so: log_source
, not source
. Nit, but it took me a sec to figure out what source
was. Other keys provided by the system include: log_level
, log_namespace
, log_format
, log_time
(which is how FileLogObserver
generates its prefix).
@exarkun commented |
---|
exarkun: That is true if source.value.computation() returns some random thing every time. But if emits, for example, a username, then this is totally parseable by a tool. The point is that the fetching username might be expensive for some reason, and best avoided if the log message will be dropped.
Thanks. Examples are indeed helpful. How about this one - how would I find all users who have written data to a path beneath /foo/bar? Is there a solution that doesn't involve rb"([^ ]+) wrote data to (.+)"
?
@wsanchez commented |
---|
Replying to exarkun:
Thanks. Examples are indeed helpful. How about this one - how would I find all users who have written data to a path beneath /foo/bar? Is there a solution that doesn't involve
rb"([^ ]+) wrote data to (.+)"
?
Slightly better would be to use !r
in the format to make it more parse friendly, but you're still playing string parsing games.
A better option is don't use FileLogObserver
, use an observer that serializes the event, and make sure your object (log_source
) serializes usefully. The problem with FileLogObserver
is that it's emitting text meant to be read by people, which is what the format argument is about.
What you can totally do in that case is:
self.log.debug()
Though that will be unfriendly to the observers that do want readable text.
@glyph commented |
---|
Replying to exarkun:
I wonder how the logging use demonstrated by
#!python self.log.info("{source} doing something interesting " "to {source.value.computation()}.")
will interact with structured log analysis tools. At first glance, the use of expressions like "source.value.computation()" inside the log message template and the explicit avoidance of constructing an extra dictionary with keys (thus, with keys into which any thought or planning has gone) means that apart from parsing the prose English version of this log message, there is no reliable way to write a tool that examines logs and treats the result of that computation semantically.
Perhaps I should add yet more code to the branch to demonstrate a workable vision for this :-) but first I'll attempt to describe one.
We can write a serializing log observer that parses the format string far enough to notice that source.value.computation()
is a thing that should be getattr
'd and apply
'd, then store the result of said application in a separate dictionary key, "formatted:source.value.computation()"
. The stored form of this will be a JSON dictionary or something, the actual smashing together of strings should be done post-serialization only when a human is actually reading it, and not when tools are analyzing it. Hence, no need for grep, just do self.logStorage.query(format="{source} doing something interesting to {source.value.computation()}.")
.
Note: no need for !r
here :). Just make sure anything that ends up in a format key is a serialization-friendly value, and not a mutable object with some random structure, and you can get integers, lists, floats and the like back out with type information intact. The serializer will get to exercise some discretion on the level of fidelity it's going to preserve during serialization, but presumably as soon as we have one which does JSON the semantics will just converge on those semantics.
This does mean you'll need to have a little more public API for log analysis tools to either look up that key or compute it as needed, but presumably that'll be part of the log analysis module :).
I'm sure this branch will come with copious documentation (and I admit I haven't read any of the code checked in so far (sorry)) so feel free to just point me at that for the answer.
Thanks for your work in this area!
@wsanchez commented |
---|
Status: The new t.p.logger.py and tests are pretty much done, I think.
I've made a number of changes to t.p.log so that it's implementation uses the logger
module where possible:
log.LogPublisher
:
logger.LegacyLogObserverWrapper
msg()
forwards events to logger.Logger.publisher
_err()
is gonelog.FileLogObserver
:
logger.FileLogObserver
%z
) instead of being tacked onto the result.
getTimezoneOffset()
and formatTime()
go away, as they are unused.log. PythonLoggingObserver
:
logger.PythonLogObserver
that adds emit()
for compatibility.logfile
and logerr
:
logger.LoggingFile
instead of StdioOnnaStick
.Those changes verify that the new logging module implements the functionality of the old one, and does so compatibly, since the existing tests for the old module (and the rest of twisted) have to keep passing even after the old module is implemented in terms of the old module.
But the tests did have to change a bit:
test_erroneousErrors
:
test_getTimezoneOffset
:
test_timeFormatting
:
test_microsecondTimestampFormatting
:
FileObserverTestCase
:
test_startLogging
:
isinstance
checks on sys.stdout
and sys.stderr
now check for logger.LoggingFile
instead of log.StdioOnnaStick
.test_customLevel
:
logging.ERROR
instead of logging.DEBUG
to avoid getting filtered.Also, in test_endpoints
:
test_logStderr
:
event["format"]
here; look at the rendered result message instead.And there are some tests in test_log
still in need of some review:
test_doubleErrorDoesNotRemoveObserver
:
log.LogPublisher._err()
to be a thing and it's not any more. Need to re-read the test and see whether is still applies.test_showwarning
:
setUp()
; not sure what's up with that.test_publisherReportsBrokenObserversPrivately
:
publisher.observers
being a list
.test_errorString
:
self.flushLoggedErrors()
. Unclear why that wasn't needed before.And finally, a few are still failing; haven't figured out why yet:
twisted.trial.test
.For reference, here's the complete diff for the branch from creation to HEAD: https://twistedmatrix.com/trac/changeset?old_path=%2Fbranches%2Flog-booyah-6750&old=40036&new_path=%2Fbranches%2Flog-booyah-6750&new=HEAD
@glyph commented |
---|
I've fixed a couple of epytext issues in the branch and gotten the 'new pydoctor warnings' error down to just
twisted.python.logger.PythonLogObserver: invalid ref to logging
which I think should explicitly remain un-fixed, because it's a result of https://bugs.launchpad.net/pydoctor/+bug/1237284 which I think mwhudson already has a fix for. At least he showed me a one-line patch which seemed to address the issue.
@wsanchez commented |
---|
HOWTO is underway:
branches/log-booyah-6750/doc/core/howto/logger.xhtml
Which is great because, for example, writing it got me a bit stuck on describing how to register observers, which I thought made more sense than I now think it does. (If it’s hard to document clearly, it’s probably not well-designed.)
Talked to Glyph at length about it, and I’m going to try a different approach there.
@glyph commented |
---|
I've been trying to address failing tests. Many builders are passing now, including the documentation builder, so yay for that!
However, I'm still wrestling with some problems; here are some in case anyone is interested and would like to help out.
log_trace
key, which seems to blow up to a really surprising size. It could also be a bug related to the ring buffer though. This also appears to be affecting the wheel-building builder, which is extra weird because I didn't think that even imported any code from Twisted?mktime
on any time longer than a year ago - or maybe it can't do it at all - so we need some other way to test the date formatting.The new pyflakes errors appear to be completely spurious, as they're in a module this branch doesn't even touch.
@wallrj commented |
---|
Replying to glyph:
- The python 3.3 tests are at least running on the branch again now, whereas before they were just blowing up while bootstrapping. They still fail, but the failures all seem to somewhat straightforwardly indicate that we're just getting unicode wrong. Some of the fixes I made need their own tests, and should very definitely be moved onto their own branch to mitigate the giganticness of this already-too-large review.
See #6570 for Python3 compatible constants. The branch for that ticket has been through a couple of reviews already, so should just need a quick final review.
@glyph commented |
---|
To follow up to my earlier comment:
The tests are in much better shape now, but it needs to be merged forward to get rwall's fix.
@glyph commented |
---|
Remaining issues before this is reviewable:
system
-key functionality to see if it's worth reproducing. I'm thinking probably not but it would be good to do a bit of checking around to make sure.@glyph commented |
---|
Merged forward to get rwall's fix, and wsanchez split up the module into a package to make the code easier to review and maintain. (The public API is still a single namespace, but the implementation is factored so it's clearer to see which pieces are core logging functionality, which pieces are integration with the stdlib logging
module, etc.)
I've just forced a build to bring the buildslaves up to date with the latest code.
@wsanchez commented |
---|
The HOWTO doc is now fairly complete:
https://twistedmatrix.com/trac/browser/branches/log-booyah-6750-3/doc/core/howto/logger.xhtml
Our logging system should have certain desirable attributes, so as:
log.debug(“%s” % (somethingExpensive(),))
log.info(“I peed my pants.”)
log.info(“{user} peed pants at {time}”, user=getUser(), time=getTime())
log.info(“{self.user} peed pants at {self.getTime()}”, self=self)
WHAT NO WAY BOOYAHlog.info(obj=self)
; it just won’t necessarily be all pretty in a text file, is all.Current logging does some, but not all of the above. The current API is also kind of a mess, where
twisted.python.log.msg()
takes*message
, notmessage
and OMG why? and global variables and stuff.Attachments:
Searchable metadata
``` trac-id__6750 6750 type__enhancement enhancement reporter__wsanchez wsanchez priority__normal normal milestone__New_Logging_System New Logging System branch__branches_log_booyah_6750_6 branches/log-booyah-6750-6 branch_author__wsanchez__glyph wsanchez, glyph status__closed closed resolution__fixed fixed component__logger logger keywords__BOOYAH_ BOOYAH! time__1379461306000000 1379461306000000 changetime__1481165626637518 1481165626637518 version__None None owner__wsanchez wsanchez cc__glyph cc__radix cc__exarkun cc__posita ```