justanhduc / task-spooler

A scheduler for GPU/CPU tasks
https://justanhduc.github.io/2021/02/03/Task-Spooler.html
GNU General Public License v2.0
273 stars 24 forks source link

"munmap_chunk(): invalid pointer" error using "-L <label>" option #13

Closed COshmyan closed 2 years ago

COshmyan commented 2 years ago

Hi! First of all, thank you for a great project: it very helps us to run some jobs in a batch manner. As we don't need a GPU, we use currently a cpu-only version 1.2.1 of Task Spooler on a Linux 64-bit platform (SLES 15 sp3).

However, we encountered an error that looks like a bug with a memory allocation/freeing. When we are trying to use "ts" with an "-L

COshmyan commented 2 years ago

Looking at the code, I found another suspicious place. The file list.c has a function shorten() (line 75). First of all, it has an malloc() call that has no a matching free() calls. I.e. memory allocated here is never returned back. Second, it has the following call of sprinf() (line 81): sprintf(newline, "%s...", newline);

However, documentation about sprinf() tells the following:

If copying takes place between objects that overlap as a result of a call to sprintf() or snprintf(), the results are undefined.

Moreover, another page (link) contains a more detailed explanation:

Notes Some programs imprudently rely on code such as the following sprintf(buf, "%s some further text", buf); to append text to buf. However, the standards explicitly note that the results are undefined if source and destination buffers overlap when calling sprintf(), snprintf(), vsprintf(), and vsnprintf(). Depending on the version of gcc(1) used, and the compiler options employed, calls such as the above will not produce the expected results.

So, probably, this call could be replaced by, for example, a strncat() call.

justanhduc commented 2 years ago

Hi @COshmyan. THanks for your interest in the project and sorry for the inconvenience. However, are you sure you are using v1.2.1-cpu? The stupid bug you mentioned actually has been fixed in e4bd2081, which is before the release of v1.2.1-cpu (and more than 1 month ago). Could you please check again?

Regarding no matching free for the malloc inside shorten, you are right. Would you like to send a PR?

COshmyan commented 2 years ago

Hi @justanhduc , thank you for a reply.

However, are you sure you are using v1.2.1-cpu?

You are right: probably, we use a version 1.2.0. I've downloaded a current version again, it has a small differences (for example, issue with sprintf() is solved). However, still there is a first issue (with freeing memory that has not been allocated by the malloc() call).

justanhduc commented 2 years ago

Hi @COshmyan. You are right the problem with malloc not being freed is not solved yet, but it should not cause any crash (except that the memory really overflows!). Have you experienced any crash so far? Anw I will fix the malloc problem in the next release.

COshmyan commented 2 years ago

Hi @justanhduc ,

You are right the problem with malloc not being freed is not solved yet, but it should not cause any crash

More critical for me was a freeing of memory that has not been allocated - it caused to the real program crash.

I've re-compiled v1.2.1-cpu. I tried to reproduce the problem with a program crash using a labels, and I have no success (at least, up to now). It's a bit strange for me, as I still see the free(command_line.label); line near the end of main.c file. Nevertheless, it works by some magic way :-)

justanhduc commented 2 years ago

Hi @COshmyan. The free(command_line.label); is not necessary but will not cause any error because command_line.label is either NULL or optarg, and freeing either of them causes no error. You can try this example from GNU and add free(optarg) at the end to see for yourself.

Anw, in the next release, I reuse the command_line.label variable for another purpose, and freeing it at the end causes an error, so I decided to remove it. It will completely resolve your issue, so I will close this thread here.

COshmyan commented 2 years ago

Hi!

I've re-compiled v1.2.1-cpu. I tried to reproduce the problem with a program crash using a labels, and I have no success (at least, up to now).

I'm able to reproduce this problem. It reveals itself when "ts" is running with "-n" and "-f" parameters together:

user@prep:~> ts -l
ID   State      Output               E-Level  Time   Command [run=0/3]
user@prep:~> ts -n -f -L 'test 1' sleep 1
munmap_chunk(): invalid pointer
Aborted (core dumped)
user@prep:~> ts -n -f -L 'test 2' sleep 2
munmap_chunk(): invalid pointer
Aborted (core dumped)
user@prep:~> echo $?
134
user@prep:~> ts -l
ID   State      Output               E-Level  Time   Command [run=0/3]
user@prep:~>

It's still v1.2.1-cpu. I see that a bit news version is available now; I'll check it a bit later.

(added) Yes, it works without errors now:

user@prep:~> ts -l
ID   State      Output               E-Level  Time   Command [run=0/3]
user@prep:~> ts -n -f -L 'test 1' sleep 1
user@prep:~> ts -n -f -L 'test 2' sleep 2
user@prep:~> echo $?
0
user@prep:~> ts -l
ID   State      Output               E-Level  Time   Command [run=0/3]
user@prep:~> ts -V
Task Spooler 1.3.0 - a task queue system for the unix user.
Copyright (C) 2007-2020  Duc Nguyen - Lluis Batlle i Rossell
user@prep:~>

Looks like this problem is fixed now. Thanks! 👍 :-)

justanhduc commented 2 years ago

Hi @COshmyan. Thanks for the pointer. I was aware of this problem. It was due to a redundant free in main.c before return. This error will not be triggered until running jobs in foreground, as you experienced. I have fixed this problem in the latest release. Feel free to reopen this thread if you encounter a similar issue.