Support accessing container logs

kairsas commented 1 year ago

Currently it's not possible to access container log streams on the Api (stdout or stderr).

It would be handy for users to catch something or log, without a need of executing docker commands manually.

Perhaps there is a simple possibility to expose them on the Container by using Docker::stdout_logs and Docker::stderr_logs.

endertunc commented 1 year ago

https://github.com/testcontainers/testcontainers-rs/pull/444 somewhat related. I believe the reason why they are not exposed because LogStream returned by those functions are marked internal https://github.com/testcontainers/testcontainers-rs/blob/0f2c985160e51a200cfc847097c15b8d85ed7df1/testcontainers/src/core/logs.rs#L39

Just curios, do you need access logs programmatically or you just want to see them somehow? I know that testcontainers-java pipes stoud/stderr to console which you can control with log levels I believe. I also had the need to see the logs from containers as they happened and it would be nice have it but didn't look at the possibilities myself yet.

thomaseizinger commented 1 year ago

It would be handy for users to catch something or log, without a need of executing docker commands manually.

If we add everything that is "handy", our API surface would be so large that the usability would suffer. Please back your feature request with a concrete usecase, thank you! :)

kairsas commented 1 year ago

In "handy", I meant not some narrow case, but a generic log access need, which I don't believe is very exceptional or user specific.

My concrete use case - I needed to see container logs while running the tests. Without this, I just need to do a manual work - run docker logs periodically for checking what's happening.

Another prospective use cases:

Filtering container log streams and displaying only some events in the test stdout/logs
Subscribing to some logs stream events and using them for orchestrating test execution flow.

It looks to me that just giving an access to container log streams, could open many possibilities for the users to achieve whatever they want.

thomaseizinger commented 1 year ago

My concrete use case - I needed to see container logs while running the tests.

I'd accept a feature PR that explores how we can forward a container's logs to stdout so we can resolve your usecase.

Without this, I just need to do a manual work - run docker logs periodically for checking what's happening.

Another prospective use cases:

Filtering container log streams and displaying only some events in the test stdout/logs

Subscribing to some logs stream events and using them for orchestrating test execution flow.

It looks to me that just giving an access to container log streams, could open many possibilities for the users to achieve whatever they want.

It is always a trade-off. If we expose too many knobs and details, then users need to learn more before they can successfully use the library.

endertunc commented 1 year ago

I have been look at some of the popular implementation such as java, golang and node they all have some way of accessing/seeing logs which I believe valuable.

There are several approaches and some of the testcontainers implementations uses multiple of these options;

First obvious approach is to expose container logs as stream. In our case we could simply expose container logs as Read and let the user do whatever they want to do with the logs.
```
fn stdout_logs(&self, container_id: &String) -> Box<dyn Read> 
```
Another approach is to give user ability to attach logger of their choose. This could come in different flavors. For example, testcontainers-java allows you attach SL4J logger to container logs. Another example is testcontainers-node which uses predefined logger to allow you to easily enable/disable container logging.

I have already working version based on https://github.com/testcontainers/testcontainers-rs/pull/503 and it look something like this;

// CLI - sync version

let cid = container_id.clone();
if log::log_enabled!(target: "testcontainers.containers", log::Level::Debug) {
    thread::spawn(move || {
        use std::io::{BufRead, BufReader};
        let log_stream = docker.stdout_logs(&cid);
        let logs = BufReader::new(log_stream.inner);
        for line in logs.lines() {
            log::debug!(target: "testcontainers.containers", "{}-{}", cid, line.unwrap());
        }
    });
}

// HTTP - async version

let cid = container_id.clone();
if log::log_enabled!(target: "testcontainers.containers", log::Level::Debug) {
    tokio::spawn(async move {
        let mut log_stream = docker.stdout_logs(&cid);
        while let Some(line) = log_stream
            .inner
            .next()
            .await
            .transpose()
            .expect("Failed to read next line from container log stream")
        {
            log::debug!(target: "testcontainers.containers", "{} - {}", cid, line);
        }
    });
}

I think we can also support these two approaches. First approach is good general solution for accessing containers logs. On top of that, second approach more DX centering where you can easily enable/disable container logs.

I hope to raise an MR once https://github.com/testcontainers/testcontainers-rs/pull/503 it's merged which is needed for this to work.

thomaseizinger commented 1 year ago

Thank you for writing this up @endertunc !

Exposing a dyn Read or dyn AsyncRead sounds like a good approach. I'd expose them on Container directly though instead of going through the docker client.

RCasatta commented 1 year ago

+1, I would like to access container logs, my use case is that I have an error that happens only in CI but not locally. Running docker logs in CI seems inconvenient

thomaseizinger commented 1 year ago

+1, I would like to access container logs, my use case is that I have an error that happens only in CI but not locally.

What that usecase perhaps be better served by writing all logs to a specified directory? For example, controlled with an environment variable?

thomaseizinger commented 1 year ago

Running docker logs in CI seems inconvenient

Not sure why you say that actually. You can dump all container logs with:

docker ps -aq | while read cid; do docker logs $cid | sed "s/^/$cid: /"; done

Pair that with TESTCONTAINERS=keep to not remove the containers and you have all logs without modifying your test's source code!

t3hmrman commented 1 year ago

So I just ran into this as well, and I'd like to contribute a PR to try and solve this, but want to make sure that we're aligned on the specifics of what might make a good solution!

For context, my particular usecase is similar to others in that I have a test that only fails in CI, and I would like to access the logs of the container to figure out what is happening.

In my case modifying the test source code is the easiest/fastest method -- and there's no easy way to get at the logs, outside of executing docker logs from inside the test. This is doable of course, but not ideal, I think.

Setting TESTCONTAINERS=keep/command: Command::Keep also works, but is a lot of code change to functions that I've since abstracted over that start containers and set up things related to them (creating initial users/data, etc).

To lay out how I think of the solution space:

Run docker logs (i.e. tokio::process::Command) from inside the test upon detecting a failure, w/ the container ID provided by the_container.id()
TESTCONTAINERS=keep requires changing CI scripts to isolate the test (so I only get one container left, since nontrivial CID is not necessarily easy), then add a command to spit out docker logs
Setting command: Command::Keep on the Container at creation time requires that I change my should-be-slow-changing test setup code, and I still have to do one of the above two (either run docker logs inside or outside the test)

To make things more concrete, here's the code I'm trying to debug:

    // Do the thing
    assert_thing_done(&arg, &another_arg, third_arg.as_bytes()).await?;

I run that, see that there's a test failure only in CI, and since I'm on a branch, I pop on a debug(ci): test failure for xxx commit and change the code to this:

    // Write new object
    if let Err(e) = assert_thing_done(&arg, &another_arg, third_arg.as_bytes()).await {
        eprintln!("OUTPUT? {}", ???);
        return Err(e);
    }

Obviously, the test still fails, but as you can see I just want to get a little bit more information, right after the failure (ideally, in the exact same spot). What I want to do is be able to replace that ??? with my_container.stdout_logs().collect() (let's say stdout_logs() returns a Stream or impl Iterator<...>, since I would expect that it's not necessarily a live stream but does end).

This helps me debug the failure pretty quickly (IMO much more quickly than changing CI setup or test setup code) and figure out if the container is doing something wrong or if the problem is elsewhere.

With regards to the previous discussion:

Exposing a dyn Read or dyn AsyncRead sounds like a good approach. I'd expose them on Container directly though instead of going through the docker client.

I agree with this solution -- since containers hold on to the client (i.e. a dyn Docker), and we have functions like stop, start and rm that use that client to do work, I think it really makes sense to have something like logs_stdout() (or stdout_logs to match the Docker trait itself) so that people can easily get at the log content from inside a test.

Of course, trait Docker has some affordances for this, notably LogStream, but that's not publicly exposed. Which I think is the primary friction left -- the average user can't do cli.stdout_logs() because trait Docker is not exposed.

Arguably trait Docker shouldn't be exposed, and neither should LogStream, so there's at least one copy required.

The PR I'd submit is pretty small and looks like this:

diff --git a/testcontainers/src/core/container.rs b/testcontainers/src/core/container.rs
index 53cbd505..f5a31744 100644
--- a/testcontainers/src/core/container.rs
+++ b/testcontainers/src/core/container.rs
@@ -6,6 +6,8 @@ use bollard_stubs::models::ContainerInspectResponse;

 use std::{fmt, marker::PhantomData, net::IpAddr, str::FromStr};

+use super::logs::WaitError;
+
 /// Represents a running docker container.
 ///
 /// Containers have a [`custom destructor`][drop_impl] that removes them as soon as they go out of scope:
@@ -226,6 +228,14 @@ where
         self.docker_client.start(&self.id);
     }

+    pub fn stdout_logs(&self) -> Result<impl Iterator<Item = String>, WaitError> {
+        self.docker_client.stdout_logs(&self.id).to_lines()
+    }
+
+    pub fn stderr_logs(&self) -> Result<impl Iterator<Item = String>, WaitError> {
+        self.docker_client.stderr_logs(&self.id).to_lines()
+    }
+
     pub fn rm(&self) {
         log::debug!("Deleting docker container {}", self.id);

diff --git a/testcontainers/src/core/logs.rs b/testcontainers/src/core/logs.rs
index 575efdca..c0735cf3 100644
--- a/testcontainers/src/core/logs.rs
+++ b/testcontainers/src/core/logs.rs
@@ -1,7 +1,8 @@
 #[cfg(feature = "experimental")]
 use futures::{stream::BoxStream, StreamExt};
 use std::{
-    fmt, io,
+    fmt,
+    io::{self, BufWriter},
     io::{BufRead, BufReader, Read},
 };

@@ -65,6 +66,15 @@ impl LogStream {

         Err(end_of_stream(lines))
     }
+
+    /// Get all current lines of output
+    pub fn to_lines(self) -> Result<impl Iterator<Item = String>, WaitError> {
+        BufReader::new(self.inner)
+            .lines()
+            .collect::<Result<Vec<String>, std::io::Error>>()
+            .map_err(WaitError::from)
+            .map(|v| v.into_iter())
+    }
 }

 fn handle_line(line: String, message: &str, lines: &mut Vec<String>) -> bool {

Does this seem like a reasonable start or PR that could close this out? At this point with how the other code has progressed it's merely a convenience function, and performing a copy of existing lines (open to more efficient ways of doing the copy by the way

thomaseizinger commented 1 year ago

Thank you for the detailed comment @t3hmrman !

I am still curious whether the following would also solve the problem:

My concrete use case - I needed to see container logs while running the tests.

I'd accept a feature PR that explores how we can forward a container's logs to stdout so we can resolve your usecase.

By default, cargo captures stdout for passing tests. Upon failure, it dumps stdout. I am wondering if it wouldn't be an even better solution if we always forward container logs to stdout?

To debug a failure then, you wouldn't even have to make code changes! We can make this conditional on the CI env being set to true if people would find that annoying for local development although given that cargo captures stdout, I don't really see why it would be a problem :)

What do you think? I am also open to adding the stdout function to Container, just wondering if we couldn't build an even better solution.

t3hmrman commented 1 year ago

Absolutely no problem!

By default, cargo captures stdout for passing tests. Upon failure, it dumps stdout. I am wondering if it wouldn't be an even better solution if we always forward container logs to stdout?

Ah this would also certainly work! Let me see if I can think of some pros/cons:

Pro: a general option for redirecting container output to stdout of the process would definitely be useful, and easy to control w/ a simple check CI (from user code above testcontainers-rs)
Con: chatty tests/containers might be hard to distinguish from one-another, multi-container might be hard (this doesn't matter if you're running only one test though)

[EDIT] - Oh also, it might make sense to actually output this information under RUST_LOG=trace w/ something like tracing.

To debug a failure then, you wouldn't even have to make code changes! We can make this conditional on the CI env being set to true if people would find that annoying for local development although given that cargo captures stdout, I don't really see why it would be a problem :)

Yeah this would enable a lack of code changes -- I think it would be the user's job (i.e. code above testcontainers-rs) to do the checking for CI, but once that kind of code was installed it would be really easy to switch on/off.

What do you think? I am also open to adding the stdout function to Container, just wondering if we couldn't build an even better solution.

Yeah I think both of these are beneficial. Being able to better control how stdout was forwarded would definitely be a good general option. Also I noticed that the Keep stuff (to keep containers around after tests) isn't accessible from code, so maybe that should be made easy to program as well.

Another nice use case might be useful is one that checks a particular container's logs after an operation.

Right now, I do exec (via the support in testcontainers-rs) some commands on the container once it's started -- while these commands do give me output I can parse, other software may not be so nice. For example let's say you trigger a re-index on a search engine, but it's a fire-and-forget, and the only way you can know it's done is a log line. Being able to grab the logs and programmatically search them might be nice for that kind of thing.

I wonder if this would work with the stdout approach... If you could tag the lines coming out of the Container before they go into the stdout stream, maybe you could open the stdout stream and detect the messages you want?

thomaseizinger commented 10 months ago

I think it would be the user's job (i.e. code above testcontainers-rs) to do the checking for CI, but once that kind of code was installed it would be really easy to switch on/off.

What do you mean by this? In my experience, setting an env var when running tests is easier than changing code and re-compiling it. And what is the point of a code option if it is "best practise" to configure testcontainers-rs to output logs when CI=true? Shouldn't we just always do that then? :)

A design goal of this library is to have as few configuration options as possible such that the most important usecases "just work"!

t3hmrman commented 10 months ago

What do you mean by this? In my experience, setting an env var when running tests is easier than changing code and re-compiling it.

Well an ENV var is a pretty coarse-grained change that affects the whole test suite, where a code change can be more surgical (i.e. in only the test you need). Modifying the CI environment that tests run in is not always easy to do for people working on codebases. I'd argue that since what you're editing is the code (the testing devloop), the code is the thing is what's in front of you -- editing CI settings is a context-switch.

That aside, I'm not really thinking of this as a configuration option -- I don't think it's this library's job to do the CI check thing -- that's up to the user.

I was hoping what this library should be able to do is at least give access to the logs. It's not always the case that I just want the logs printed, I could want to do something else with them, or process them in some other way.

thomaseizinger commented 9 months ago

To wrap this up, I'd suggest the following:

Always forward a container's stdout and stderr to the parent process

Cargo captures stdout and stderr for successful tests, meaning this isn't noisy by default and only shows the logs when a test fails.

Add stdout and stderr functions to Container and ContainerAsync that return the respective trait object dyn Read and dyn AsyncRead.

The two are actually entirely separate from each other and can / should implement in separate PRs :)

testcontainers / testcontainers-rs

Support accessing container logs #502