ilpincy / argos3

A parallel, multi-engine simulator for heterogeneous swarm robotics
http://www.argos-sim.info/
268 stars 121 forks source link

Bugs in ARGoS cmdline #207

Closed jharwell closed 2 years ago

jharwell commented 2 years ago

I've run into two issues with the ARGoS cmdline when I'm using it in an HPC environment. To make it easier to grep for errors/segfaults when I'm running thousands of ARGoS simulations, I need to suppress all ARGoS output to stdout and stderr. I can of course do that by appending 2>&1 > /dev/null to all ARGoS commands, but that will suppress even the failing asserts that might get triggered in my code, so I need to use ARGoS cmdline arguments.

The first issue I ran into is either a bug in the man page, or a bug in the ARGoS cmdline parser. man argos3 says that you can redirect stdout logging to a file via --log-file=FILE. However, when I do:

argos3 -c myexp.argos --log-file=/dev/null

I get: [FATAL] Unrecognized option "--log-file=/dev/null".

It seems like the cmdline parsing is not quite right. This also happens for --logerr-file=/dev/null. If I remove the = and do --logerr-file /dev/null or --log-file /dev/null then I don't get that error.

However, this leads to the second issue. When I pass --log-file /dev/null or --logerr-file /dev/null, I would expect that ARGoS is 100% silent and does not print anything to the screen. However, I still see the usual diagnostics e.g. "[INFO] Using X parallel threads..."

I looked at argos_command_line_parser.cpp and nothing immediately jumped out at me as being incorrect. Any ideas?

ilpincy commented 2 years ago

I tried the following commands on my machine, and they work as intended:

argos3 -l /dev/null -c experiments/diffusion_1.argos
argos3 --log-file /dev/null -c experiments/diffusion_1.argos
jharwell commented 2 years ago

Found the issue! The specific use case I was targeting was when I get the library name wrong for either the controller or the loop functions, and ARGoS doesn't start. In those cases, I still want ARGoS to only return non-zero, and not print anything to the screen. However, when the library name is wrong ARGoS will throw an uncaught exception. If I do :

LOG << "some text" << std::endl

Anywhere before or after the library fails to load and things ultimately crash, it is silent. However, if I do:

LOGERR << "some text" << std::endl

Then I still see the text when the exception prints. I imagine its being buffered (and not yet set to /dev/null), and when the exception prints that mechanism takes anything that was queued up for stderr and prints it to the terminal. I don't see the text if I call Flush() before the exception triggers.

I think the easy part of the solution here is to update the man page so that it's clear users can't do --log-file=/some-file on --logerr-file=/other/file (it currently says they can). For the other part, I think its reasonable as a user to assume that if you set the stderr logfile to something via --logerr-file that everything that would come out on stderr should go to it, including exceptions. I'm not really sure how to implement this though. If this can't be implemented/would require too many changes, then the man page should be updated to say that ARGoS will still print exceptions on startup even when --logerr-file is passed.

ilpincy commented 2 years ago

Are you using a custom main.cpp file?

In the original main.cpp the log redirection is set before the libraries are loaded (see here), so the errors, in theory, should be redirected to /dev/null.

jharwell commented 2 years ago

Yes, I'm currently using the latest ARGoS master. I also tried with a normal file (/tmp/foo.txt), and saw the same result, so it's not an issue with redirecting to a special system file. I'm on ubuntu 20.04, if that makes a difference.

ilpincy commented 2 years ago

Maybe we can add a Flush(), as you suggested, just after the command line arguments are parsed. Can you try adding the following code:

#ifdef ARGOS_THREADSAFE_LOG
      LOG.Flush();
      LOGERR.Flush();
#endif

at the end of this method?

jharwell commented 2 years ago

That didn't change anything, but it got me to the answer. LOG and LOGERR wrap std::cout and std::cerr, which are different than stdout and stderr (the C file descriptors). When exceptions are printed, they apparently get printed to stderr, not to std::cerr.

Adding the following to argos_command_line_arg_parser.cpp:113 right after the redirection for std::cerr works:

freopen(m_strLogErrFileName.c_str(), "w", stderr);

similarly for stdout. With these and no other changes, ARGoS is indeed silent when the redirection options are passed.

ilpincy commented 2 years ago

Great! Thanks for looking into this. Feel free to make a pull request for this fix when you have the time.

jharwell commented 2 years ago

Opened #209, closing this.