tensorflow / java

Java bindings for TensorFlow
Apache License 2.0
785 stars 193 forks source link

Wrap C++ logging using Java logging framework #257

Open cowwoc opened 3 years ago

cowwoc commented 3 years ago

System information

By default, TensorFlow's C++ libraries log messages like this:

2021-03-29 15:50:24.014316: W external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-03-29 15:50:24.014340: I external/org_tensorflow/tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

Ignoring the contents of the logs for a minute (I know how to resolve this warning) I would like to redirect all logging output from the C++ library into a java logging framework (such as slf4j) so that I can control the logging level, redirect logs to a file, specify a rollover policy (e.g. truncate logs at the end of every day) and so on.

saudet commented 3 years ago

It looks like we can already pretty easily do that by calling the already mapped TF_RegisterLogListener() function: https://github.com/tensorflow/java/blob/master/tensorflow-core/tensorflow-core-api/src/gen/java/org/tensorflow/internal/c_api/global/tensorflow.java#L3024

cowwoc commented 3 years ago

@saudet Sorry, I'm not sure how to use this code. It's easy to register listeners from C (e.g. https://github.com/tensorflow/tensorflow/blob/5845bbfacc6819eaa386ab5a9c3d9cb1df6bb075/tensorflow/python/eager/pywrap_tfe_src.cc#L4221) but how would you do the same from Java code?

Are you saying that the maintainers of this project should surface a Java API for this? Or are you saying that end-users can use this today from Java code?

rnett commented 3 years ago

Both, I think. It's public, so you can use it yourself by passing a function pointer (i.e. https://github.com/bytedeco/javacpp/blob/master/src/test/java/org/bytedeco/javacpp/PointerTest.java#L84), but it's something we should add an API for.

Craigacp commented 3 years ago

He's saying if you understand JavaCPP then there is a way to construct a JavaCPP FunctionPointer which you can pass into the TF_RegisterLogListener function which will register it with the C level function in TF to do the logging. In practice we wouldn't expect users to be able to do that because using JavaCPP's exposed bindings directly is tricky. So we'll put it on the list of things to do.

Controlling the logging level looks harder as I can't see an endpoint for that in the C API, so redirecting it up into a Java logger might be the best we can do without changing the C API (which takes a lot longer). I guess we could manipulate the environment to insert a specific value of TF_CPP_MIN_LOG_LEVEL, but that would have to be set before the native library was loaded which is a little tricky to arrange.

rnett commented 3 years ago

We should be able to parse the messages and extract the level in most cases, and feed it into the Java logger (Slf4j?) w/ that level, I would think.

rnett commented 3 years ago

Well, I just tested this and it doesn't seem to work. Is it possible that TF_RegisterLogListener is only for TF_Server? I'm registering it like (kotlin):

tensorflow.TF_RegisterLogListener(object : Listener_String() {
    override fun call(arg0: String?) {
        println("Log: $arg0")
    }
})

but it's never called.

Craigacp commented 3 years ago

We should be able to parse the messages and extract the level in most cases, and feed it into the Java logger (Slf4j?) w/ that level, I would think.

We might be able to do this, but no project I've been involved in has been remotely simple when it started with the phrase "let's parse the log messages". We'll suddenly become dependent on an internal implementation detail which isn't specified and that's asking for trouble.

saudet commented 3 years ago

Well, I just tested this and it doesn't seem to work. Is it possible that TF_RegisterLogListener is only for TF_Server? I'm registering it like (kotlin):

tensorflow.TF_RegisterLogListener(object : Listener_String() {
    override fun call(arg0: String?) {
        println("Log: $arg0")
    }
})

Make sure the callback doesn't get GCed, for example, by keeping a reference in a field somewhere, but if that happened, it would be crashing...

rnett commented 3 years ago

Make sure the callback doesn't get GCed, for example, by keeping a reference in a field somewhere, but if that happened, it would be crashing...

Yeah, that didn't help (as expected).

rnett commented 3 years ago

This is a known issue apparently: https://github.com/tensorflow/tensorflow/issues/44995