Closed sandyharvie closed 5 months ago
Hey, sorry for the late response.
How are L'
and F'
related ? The 4th bullet point is a bit confusing. Is it L'
's handler or L
's handler. If it is the former it should be for F'
but F
is mentioned
We can make the logic in the MR that reopens the deleted file optional, but I would like to understand the problem first.
Would you be able to submit a small code sample that could potentially reproduce this ?
I've edited the original comment to be more clear. A user should, in my opinion, be able to safely perform the following sequence of events:
This is currently not guaranteed to work, though. The flush event created in step 7 can end up getting processed before the first logger is invalidated, which means the first logger's handler will still be considered "active" and therefore have its flush()
function invoked as part of the processing of this flush event. This invocation of flush()
will see that the file no longer exists, then fail trying to create it again.
The fundamental issue here, as I see it, is that the set of "active" handlers is not recomputed when processing a flush event. I understand that this is an expensive operation (as documented here), but it seems like it'd be worth the cost given that flushing is already a slow operation.
Thanks, I see what you mean.
The Handlers can be shared between loggers, so you can have one invalidated logger and a valid logger using the same handler.
Even if we check inside _force_flush() for non invalidated loggers filter only the active handlers and flush them, then a logger can still be invalidated right after the filtering just before we call flush() on the handler.
I think if you change the sequence in step 5 it should work fine in the current version.
I will try to reproduce it first in the following way
a. logger_a logs then flush b. Manually delete file_a.log c. Log then flush from logger_b which will now try to also flush the deleted file_a from logger_a
If the above reproduces it, then maybe we can check if the file exists before calling flush on it on the handler. And we can also check for invalidated logger in _force_flush()
There still will be a race condition eg file is removed after the existence check just before flushing but probably both those changes will mitigate it making it more rare to happen
What are your thoughts?
I have tried to reproduce something like this
#include "quill/Quill.h"
#include <iostream>
/**
* Logging to the same file using multiple loggers
*/
int main()
{
// Start the backend logging thread
quill::start();
// Log using the root logger
std::shared_ptr<quill::Handler> file_handler_a =
quill::file_handler("example_trivial_a.log",
[]()
{
quill::FileHandlerConfig cfg;
cfg.set_open_mode('w');
cfg.set_append_to_filename(quill::FilenameAppend::None);
return cfg;
}());
quill::Logger* logger_a = quill::create_logger("logger_a", std::move(file_handler_a));
std::shared_ptr<quill::Handler> file_handler_b =
quill::file_handler("example_trivial_b.log",
[]()
{
quill::FileHandlerConfig cfg;
cfg.set_open_mode('w');
cfg.set_append_to_filename(quill::FilenameAppend::None);
return cfg;
}());
quill::Logger* logger_b = quill::create_logger("logger_b", std::move(file_handler_b));
for (size_t i = 0; i < 1000; ++i)
{
QUILL_LOG_INFO(logger_a, "log something {}", i);
}
quill::flush();
std::cout << "Delete example_trivial_a.log file now " << std::endl;
std::this_thread::sleep_for(std::chrono::seconds{10});
for (size_t i = 0; i < 10; ++i)
{
QUILL_LOG_INFO(logger_b, "log something {}", i);
}
quill::flush();
}
I see the exception it is because the directory doesn't exist anymore.. Otherwise flush will recreate an empty file
Yeah, you need to delete the file and the directory in which it was created.
I think if you change the sequence in step 5 it should work fine in the current version.
- First flush and invalidate the logger.
- After the logger is removed then the associated file fd and handler will be closed so you can safely remove the file
AFAIK, the current API does not allow me to check whether the logger has been removed. I can only trigger an asynchronous removal and expect that it will eventually be removed, so I am not sure how to perform step 2 here reliably.
There still will be a race condition eg file is removed after the existence check just before flushing but probably both those changes will mitigate it making it more rare to happen
The race condition you describe can only happen if the removal of the file's directory is performed after or concurrent to the submission of the second flush event. If it is performed before the submission of the second flush event, as is the case in my example and your code above, then the file's deletion will necessarily be detected while processing the flush event.
What are your thoughts?
Making logger removal synchronous would be a clean solution, but would likely be a more arduous addition than simply checking for invalidated loggers in BackendWorker::_force_flush()
. I also think making recreation of the file optional would be a good addition to the API. I am happy to submit a PR for this, so just let me know what you'd prefer.
Thanks for reporting this. I found a few more things to improve while looking at this. It should not happen again after this PR
A potential fix for this would be to have
BackendWorker::_force_flush()
check for invalidated loggers and recompute the set of "active" handlers before performing the actual flushing. I'd be happy to submit an MR for this, if you think that sounds reasonable.Originally posted by @sandyharvie in https://github.com/odygrd/quill/issues/267#issuecomment-1928368172