machinekit / machinekit

http://machinekit.io
Other
409 stars 180 forks source link

Multicore merge problem tracker #1123

Closed ArcEye closed 6 years ago

ArcEye commented 7 years ago

This is the issue tracker to which any problems related to the merge of multicore code into the main repo, should be reported.

machinekoder commented 7 years ago

https://github.com/machinekit/machinekit/issues/1131

machinekoder commented 7 years ago

https://github.com/machinekit/machinekit/issues/1137

machinekoder commented 7 years ago

Is there any writeup of the changes? I had to modify several HAL configs and components to get the multicore branch working properly.

pmcstone commented 7 years ago

1145 This happened after update

ArcEye commented 7 years ago

Is there any writeup of the changes? I had to modify several HAL configs and components to get the multicore branch working properly.

Not yet. It would be helpful to know what you had to change though

ArcEye commented 7 years ago

From @pmcstone

This issue has been resolved by manually installing icomps. However now I ran into another issue with a custom driver/protocol for my IO hardware (communicates via USB to RS485) Please see attached files and error messages. Any help would be greatly appreciated since I am merely just a power user. Thanks

starting mklauncher... done starting configserver... done starting ./python/pmcsfile_service.py... done starting machinekit... MACHINEKIT - 0.1 Machine configuration directory is '/home/pmcs/Downloads/pmcs-rt' Machine configuration file is 'v6.ini' Starting Machinekit... io started halcmd loadusr io started done hal/v6.hal:14: insmod failed, returned -1: do_load_cmd: dlopen: /usr/lib/linuxcnc/rt-preempt/hal_p260c.so: undefined symbol: hal_exit rpath=/usr/lib/linuxcnc/rt-preempt See /var/log/linuxcnc.log for more information. Shutting down and cleaning up Machinekit... Traceback (most recent call last): File "/home/pmcs/bin/estop.py", line 16, in Traceback (most recent call last): File "/home/pmcs/bin/mtc.py", line 15, in time.sleep(2.00) KeyboardInterrupttime.sleep(2.00)

KeyboardInterrupt Cleanup done Machinekit terminated with an error. You can find more information in the log: /home/pmcs/linuxcnc_debug.txt and /home/pmcs/linuxcnc_print.txt as well as in the output of the shell command 'dmesg' and in the terminal stopping mklauncher... done stopping configserver... done stopping ./python/pmcsfile_service.py... done

Reply from @arceye

It indicates incorrect linkage in the build of the component. Without the component code and knowing how it was built, unable to guess further

If hal_exit() did not exist, machinekit would not run, there are about 1230 binaries and libs linked against it.

Running nm -C hal_p260c | grep " U " from the dir it is in, will list all the symbols which are undefined. (U) I would suspect a great deal more than just hal_exit()

hal_exit() is an inline accessor to halg_exit() contained in https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal.h#L379 So you may see references to halg_exit

ArcEye commented 7 years ago

From @pmcstone

Well I think I might of broke something very bad.......running the command nm -C hal_p260c | grep " U " gave this:

             U cfsetispeed@@GLIBC_2.2.5
             U cfsetospeed@@GLIBC_2.2.5
             U close@@GLIBC_2.2.5
             U hal_exit
             U hal_export_funct
             U hal_malloc
             U hal_param_bit_newf
             U hal_param_s32_newf
             U hal_pin_bit_newf
             U hal_pin_s32_newf
             U hal_ready
             U hal_xinit
             U ioctl@@GLIBC_2.2.5
             U memset@@GLIBC_2.2.5
             U open@@GLIBC_2.2.5
             U read@@GLIBC_2.2.5
             U rtapi_print_msg
             U rtapi_snprintf
             U rtapi_switch
             U strtok@@GLIBC_2.2.5
             U strtol@@GLIBC_2.2.5
             U tcdrain@@GLIBC_2.2.5
             U tcflush@@GLIBC_2.2.5
             U tcgetattr@@GLIBC_2.2.5
             U tcsetattr@@GLIBC_2.2.5
             U write@@GLIBC_2.2.5

Which seems like everything is undefined

Reply from @arceye

If you would like to 'donate' the driver, I can add it to the repo and it will get built properly, automatically at any rebuild,

Just tested and

root@INTEL-i7:/usr/src/machinekit# DEBUG=5 realtime restart root@INTEL-i7:/usr/src/machinekit# halcmd loadrt hal_p260c

:0: Realtime module 'hal_p260c' loaded root@INTEL-i7:/usr/src/machinekit# halcmd show pin Component Pins: Comp Inst Type Dir Value Name Epsilon Flags linked to: 78 bit OUT FALSE hal_p260c.0.pin-01-in --l- 78 bit IN FALSE hal_p260c.0.pin-01-out --l- 78 bit OUT FALSE hal_p260c.0.pin-02-in --l- 78 bit IN FALSE hal_p260c.0.pin-02-out --l- 78 bit OUT FALSE hal_p260c.0.pin-03-in --l- 78 bit IN FALSE hal_p260c.0.pin-03-out --l- 78 bit OUT FALSE hal_p260c.0.pin-04-in --l- 78 bit IN FALSE hal_p260c.0.pin-04-out --l- 78 bit OUT FALSE hal_p260c.0.pin-05-in --l- 78 bit IN FALSE hal_p260c.0.pin-05-out --l- 78 bit OUT FALSE hal_p260c.0.pin-06-in --l- 78 bit IN FALSE hal_p260c.0.pin-06-out --l- 78 bit OUT FALSE hal_p260c.0.pin-07-in --l- 78 bit IN FALSE hal_p260c.0.pin-07-out --l- 78 bit OUT FALSE hal_p260c.0.pin-08-in --l- 78 bit IN FALSE hal_p260c.0.pin-08-out --l- 78 bit OUT FALSE hal_p260c.0.pin-09-in --l- 78 bit IN FALSE hal_p260c.0.pin-09-out --l- 78 bit OUT FALSE hal_p260c.0.pin-10-in --l- 78 bit IN FALSE hal_p260c.0.pin-10-out --l- 78 bit OUT FALSE hal_p260c.0.pin-11-in --l- 78 bit IN FALSE hal_p260c.0.pin-11-out --l- 78 bit OUT FALSE hal_p260c.0.pin-12-in --l- 78 bit IN FALSE hal_p260c.0.pin-12-out --l- 78 bit OUT FALSE hal_p260c.0.pin-13-in --l- 78 bit IN FALSE hal_p260c.0.pin-13-out --l- 78 bit OUT FALSE hal_p260c.0.pin-14-in --l- 78 bit IN FALSE hal_p260c.0.pin-14-out --l- 78 bit OUT FALSE hal_p260c.0.pin-15-in --l- 78 bit IN FALSE hal_p260c.0.pin-15-out --l- 78 bit OUT FALSE hal_p260c.0.pin-16-in --l- 78 bit IN FALSE hal_p260c.0.pin-16-out --l- 78 s32 IN 0 hal_p260c.0.rx_cnt_error --l- 78 bit OUT FALSE hal_p260c.0.rx_comm_error --l- 78 bit OUT FALSE hal_p260c.0.rx_perm_error --l- 78 s32 OUT 0 hal_p260c.refresh.time ---- 78 s32 I/O 0 hal_p260c.refresh.tmax ---- 78 bit OUT FALSE hal_p260c.refresh.tmax-inc ---- 78 bit OUT FALSE hal_p260c.rx_comm_error --l- 78 bit OUT FALSE hal_p260c.rx_perm_error --l- 78 bit IN FALSE hal_p260c.rx_reset_error --l- 78 s32 IN 0 hal_p260c.sys_max_read --l- 78 s32 IN 0 hal_p260c.sys_max_write --l- 78 s32 IN 0 hal_p260c.sys_writecnt --l- I suspect you may have been trying to use the old module and not re-built to account for linkage relocations, this will solve it in future.
ArcEye commented 7 years ago

Above 2 entries to preserve items in forum posts

pmcstone commented 7 years ago

Yes I am willing to donate it. Thanks for everything Arc!

machinekoder commented 7 years ago

@ArcEye will you integrate the driver?

ArcEye commented 7 years ago

Just done so at #1150

machinekoder commented 7 years ago

localpincount is now named local_pincount for instcomps. This is an important change especially since one can use pincount as well in the components. However, using pincount results in things not working.

ArcEye commented 7 years ago

localpincount is now named local_pincount for instcomps. This is an important change especially since one can use pincount as well in the components. However, using pincount results in things not working.

https://github.com/machinekit/machinekit-docs/blob/master/docs/hal/instcomp.asciidoc#instanceparams

pincount does not work inside the function, because instcomp sets it to -1 at instantiation, so that any value passed to one instance is not then passed to any subsequent instances that don't specify pincount.

Same goes for all instanceparams and argc/argv for that matter, they all now have local_xxxx copies which can be used safely

machinekoder commented 7 years ago

Another design change since the multicore merge is that array variable types have changed as follows: Previously one could access a variable hal_bit_t sample[TRIGGER] in the component: sample[0], now one can use sample(0). No problem, but a design change that should be noted down.

ArcEye commented 7 years ago

It is not a change to arrays, it is the convenience macros used by comp and instcomp. The variables in question are not local ones within the function, but instance ones contained in the *ip instance structure.

The convenience macros #define pins to a dereferenced pointer of the same name and the struct address of variables, so that users can just refer to the name. Square brackets are changed to parenthesis brackets, for operations involving these macros https://github.com/machinekit/machinekit/blob/master/src/hal/utils/instcomp.g#L1001

You can still refer to ip->local_variable[x] or use local_variable(x).

What you can't do is

int int_array[3] = {1,2,3};
int num = int_array(0);

because there is no macro defining int_array(x) and the compiler will expect a function

ArcEye commented 7 years ago

But that is what I need, so don't stop pointing out things like that.

Over familiarity prevents me from looking at things the same way as others on some occasions :smile:

machinekoder commented 7 years ago

I have two more problems: When I start to watch halcmd using watch -n 0.1 halcmd show pin foo rtapi seems to die after some time:

halcmd: hal_init() failed: -12
NOTE: 'rtapi' module must be loaded

The other problem is related to Haltalk. I have one U32 out of HAL remote component pin that never is updated in HAL. I still have to figure out whats happening here.

machinekoder commented 7 years ago

I can verify the second problem on an isolated setup - the problem seems to be applicable for all U32 pins.

machinekoder commented 7 years ago

I created a tag before the multicore-merge: https://github.com/machinekit/machinekit/tree/before-multicore so users can check out this tag in case there are problems.

ArcEye commented 7 years ago

I can verify the second problem on an isolated setup - the problem seems to be applicable for all U32 pins.

Can you attach a link to something I can test and I will look after lunch

ArcEye commented 7 years ago

When I start to watch halcmd using watch -n 0.1 halcmd show pin foo rtapi seems to die after some time: halcmd: hal_init() failed: -12 NOTE: 'rtapi' module must be loaded

I can reproduce this one.
It appears to be a memory leak from launching halcmd 10x every second. Something is not getting freed and eventually it runs out of memory

Feb 28 14:23:39 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:23:39 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:23:39 INTEL-i7 msgd:0: ulapi:16104:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 14:23:39 INTEL-i7 msgd:0: ulapi:16104:user halg_xinitfv:271 HAL: singleton component 'hal_lib16104' id=1014 initialized
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user --halcmd show pin db.funct.time
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:293 HAL: removing component 1016 'halcmd16104'
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=1014
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:293 HAL: removing component 1014 'hal_lib16104'
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:315 HAL: hal_errorcount()=0
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:316 HAL: _halerrno=0

becomes

Feb 28 14:34:15 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:34:15 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 14:34:15 INTEL-i7 msgd:0: hal heap:13547:user rtapi_malloc: out of memory (size=96 arena=522560)
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user hal_heap_addmem:58 HAL: extending arena by 512 bytes
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user hal_heap_addmem:61 HAL error: can't extend arena - below minfree: 944
Feb 28 14:34:15 INTEL-i7 msgd:0: hal heap:13547:user rtapi_malloc: out of memory (size=96 arena=522560)
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user shmalloc_desc:85 HAL error: giving up - can't allocate 96 bytes
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user halg_create_objectfv:155 HAL error: insufficient memory for COMPONENT hal_lib13547 size=96
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=13693
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user halg_exit:289 HAL error: no such component with id 13693

It is a pretty severe test, 600 loads per minute. Not something that would have shown up under normal use.

ArcEye commented 7 years ago

The problem appears likely to be in here https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal_comp.c#L279 in halg_exit()

I have it running at present with the debug section reporting memory enabled https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal_comp.c#L317 to see what happens

ArcEye commented 7 years ago

This is where it failed with the hal_sweep enabled

Feb 28 15:20:00 INTEL-i7 rtapi:0: 4:rtapi_app:15998:user pid=15998 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 15:20:01 INTEL-i7 rtapi:0: 4:rtapi_app:15998:user pid=15998 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25051:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25051:user halg_xinitfv:271 HAL: singleton component 'hal_lib25051' id=32762 initialized
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user --halcmd show pin db.funct.time
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:293 HAL: removing component 32764 'halcmd25051'
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32762
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:293 HAL: removing component 32762 'hal_lib25051'
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:315 HAL: hal_errorcount()=0
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:316 HAL: _halerrno=0
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:151 HAL: HAL heap heap status
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:153 HAL:   arena=262144 totail_avail=260112 fragments=1 largest=260112
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:157 HAL:   requested=1569872 allocated=1569872 freed=1568064 waste=0%
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:151 HAL: global heap heap status
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:153 HAL:   arena=787136 totail_avail=262032 fragments=1 largest=262032
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:157 HAL:   requested=724926 allocated=786216 freed=261344 waste=7%
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_memory_usage:168 HAL:   strings on global heap: alloc=200446 freed=200163 balance=283
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_memory_usage:175 HAL:   RT objects: 464  alignment loss: 7  (1%)
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_memory_usage:177 HAL:   hal_malloc():   1
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_memory_usage:179 HAL:   unused:   261360
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:320 HAL: hal_sweep: 2 objects freed
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25056:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25056:user halg_xinitfv:271 HAL: singleton component 'hal_lib25056' id=32766 initialized
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32766
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:293 HAL: removing component 32766 'hal_lib25056'
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:315 HAL: hal_errorcount()=0
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:316 HAL: _halerrno=0
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:151 HAL: HAL heap heap status
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:153 HAL:   arena=262144 totail_avail=260112 fragments=1 largest=260112
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:157 HAL:   requested=1570064 allocated=1570064 freed=1568256 waste=0%
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:151 HAL: global heap heap status
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:153 HAL:   arena=787136 totail_avail=262032 fragments=1 largest=262032
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:157 HAL:   requested=724951 allocated=786248 freed=261376 waste=7%
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_memory_usage:168 HAL:   strings on global heap: alloc=200471 freed=200188 balance=283
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_memory_usage:175 HAL:   RT objects: 464  alignment loss: 7  (1%)
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_memory_usage:177 HAL:   hal_malloc():   1
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_memory_usage:179 HAL:   unused:   261360
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:320 HAL: hal_sweep: 1 objects freed
Feb 28 15:20:01 INTEL-i7 rtapi:0: 4:rtapi_app:15998:user pid=15998 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_ready:354 HAL error: component 32770 not found
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_xinitfv:265 HAL error: hal_ready(32770) failed rc=-22
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_ready:354 HAL error: component 22 not found
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user --halcmd show pin db.funct.time
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_exit:289 HAL error: no such component with id 22
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32770
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_exit:289 HAL error: no such component with id 32770

Doesn't exactly make it clearer. The crash immediately follows the line Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:320 HAL: hal_sweep: 1 objects freed whereas every other print has said 2 objects freed

ArcEye commented 7 years ago

If you want to 'try this at home'

mick@INTEL-i7:/usr/src/machinekit$ DEBUG=5 realtime restart
mick@INTEL-i7:/usr/src/machinekit$ halcmd newinst debounce db pincount=4
<commandline>:0: Realtime module 'debounce' loaded
mick@INTEL-i7:/usr/src/machinekit$ halcmd newthread servo 1000000 fp
mick@INTEL-i7:/usr/src/machinekit$ halcmd addf db servo
<commandline>:0: Function 'db' added to thread 'servo', rmb=0 wmb=0
mick@INTEL-i7:/usr/src/machinekit$ halcmd start
<commandline>:0: Realtime threads started
mick@INTEL-i7:/usr/src/machinekit$ watch -n0.1 halcmd show pin db.funct.time
ArcEye commented 7 years ago

@machinekoder I created a tag before the multicore-merge: https://github.com/machinekit/machinekit/tree/before-multicore so users can check out this tag in case there are problems.

I was able to check this out and create a branch from it, albeit I think it should go much further back, to https://github.com/machinekit/machinekit/commit/75c06ffc1d693d9660e8910146fe9192aaed5459

However it contains stuff it shouldn't do and fails to build because the conv macros are in both src/hal/i_components and src/hal/components Since the move to i_components was in a fairly recent commit by yourself at https://github.com/machinekit/machinekit/commit/dda1f0134214718b37a6bb567991b93f279a9dc3, that is just peculiar.

However git reset --hard 75c06ff works fine and the result builds.

I am just running a sanity test, to ensure that this memory leak did not pre-exist the wholesale changes that @mhaberler made to memory allocation in multicore.

Result: It doesn't, so back to the current HEAD and valgrind or similar, if I can get MK to run in it.

machinekoder commented 7 years ago

@ArcEye you need to run make clean first to get rid of the conv macros

ArcEye commented 7 years ago

I thought I had cleaned it, but no matter. git reset --hard 75c06ff works fine

ArcEye commented 7 years ago

The only thing I can say for sure about this error, is that it is directly related to the number of times that halcmd is run and hal_lib is loaded. Changing the frequency of watch to watch -n0.5 will extend the time it takes to run out of memory by 5x.

Next need to run a command which increments a param but does not display anything. That should hopefully point towards whether it is the loading itself or the searching and display that has the leak.

ArcEye commented 7 years ago

Running halcmd setp db.delay $counter from within a loop which increments and prints $counter produces 2495 iterations before running out of memory.

This corresponds exactly with the time that the previous tests ran before error eg. watch -n0.1.......... lasted 4 mins 9 secs approx, which is almost exactly 2495 / 600 (4.158)

So it is nothing within the print_pin_info() display routine and looks like being the same amount of memory lost per load / unload.

cdsteinkuehler commented 7 years ago

On 3/1/2017 10:16 AM, ArcEye wrote:

Running |halcmd setp db.delay $counter| from within a loop which increments and prints |$counter| produces 2495 iterations before running out of memory.

This corresponds exactly with the time that the previous tests ran before error eg. |watch -n0.1..........| lasted 4 mins 9 secs approx, which is almost exactly 2495 / 600 (4.158)

So it is nothing within the print_pin_info() display routine and looks like being the same amount of memory lost per load / unload.

Are we sure this is an issue with halcmd, or do other programs that access HAL and RTAPI see similar behavior? Can someone test a python script that just does something simple (like show a pin value) and see if it also leaks memory? I'd test myself, but I don't have a test system handy and given my lack of Python-foo, this is probably an hour task for me (and it seems like it ought to be 5-10 minutes).

I recently had a memory leak problem with HAL, but that was caused by launching a bunch of user-space threads incorrectly (I wasn't creating detached threads, and I wasn't joining to them), as explained here:

https://www.ibm.com/developerworks/library/l-memory-leaks/

...but that's probably not what's going on in this case.

-- Charles Steinkuehler charles@steinkuehler.net

ArcEye commented 7 years ago

I had nothing to do with the new memory routines in multicore and know little about their operation, but I am speculating towards a problem with memory allocation / retrieval.

If memory is being aligned for atomic operations, is that causing wastage at the boundaries which is just being orphaned?

Turning on all the debugging prints in hal_memory.c / rtapi_heap.c etc produces the below.

Whilst a cumulative increase in errors and waste is apparent, nothing reported seems to account for the failure immediately after the last print.

Also, my first run of the day after a fresh start, lasted 3x longer than the test runs yesterday.

Is that simply because the system allocated memory much closer to an alignment boundary, which resulted in less waste aligning and thus it went longer before running out?

early

Mar  2 08:46:39 INTEL-i7 msgd:0: ulapi:9203:user _ulapi_init(): ulapi rt-preempt unknown loaded
Mar  2 08:46:39 INTEL-i7 msgd:0: ulapi:9203:user halg_xinitfv:271 HAL: singleton component 'hal_lib9203' id=106 initialized
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user --halcmd addf db servo
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user hal_add_funct_to_thread:214 HAL: adding function 'db' to thread 'servo'
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user hal_add_funct_to_thread:233 HAL WARNING: 'db' should be added to thread as 'db.funct'
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user halg_exit:293 HAL: removing component 108 'halcmd9203'
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=106
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user halg_exit:293 HAL: removing component 106 'hal_lib9203'
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user halg_exit:315 HAL: hal_errorcount()=0
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user halg_exit:316 HAL: _halerrno=0
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user report_heapstatus:151 HAL: HAL heap heap status
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user report_heapstatus:153 HAL:   arena=262144 totail_avail=260112 fragments=1 largest=260112
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user report_heapstatus:157 HAL:   requested=2384 allocated=2384 freed=576 waste=0%
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user report_heapstatus:151 HAL: global heap heap status
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user report_heapstatus:153 HAL:   arena=787136 totail_avail=262032 fragments=1 largest=262032
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user report_heapstatus:157 HAL:   requested=524830 allocated=524968 freed=96 waste=0%
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user report_memory_usage:168 HAL:   strings on global heap: alloc=350 freed=69 balance=281
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user report_memory_usage:175 HAL:   RT objects: 464  alignment loss: 7  (1%)
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user report_memory_usage:177 HAL:   hal_malloc():   1
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user report_memory_usage:179 HAL:   unused:   261360
Mar  2 08:46:39 INTEL-i7 msgd:0: hal_lib:9203:user halg_exit:320 HAL: hal_sweep: 2 objects freed
Mar  2 08:46:45 INTEL-i7 rtapi:0: 4:rtapi_app:9183:user pid=9183 flavor=rt-preempt gcc=4.9.2 git=unknown
Mar  2 08:46:45 INTEL-i7 msgd:0: ulapi:9208:user _ulapi_init(): ulapi rt-preempt unknown loaded
Mar  2 08:46:45 INTEL-i7 msgd:0: ulapi:9208:user halg_xinitfv:271 HAL: singleton component 'hal_lib9208' id=110 initialized
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user --halcmd start
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user hal_start_threads:343 HAL: starting threads
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user halg_exit:293 HAL: removing component 112 'halcmd9208'
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=110
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user halg_exit:293 HAL: removing component 110 'hal_lib9208'
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user halg_exit:315 HAL: hal_errorcount()=0
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user halg_exit:316 HAL: _halerrno=0
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user report_heapstatus:151 HAL: HAL heap heap status
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user report_heapstatus:153 HAL:   arena=262144 totail_avail=260112 fragments=1 largest=260112
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user report_heapstatus:157 HAL:   requested=2576 allocated=2576 freed=768 waste=0%
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user report_heapstatus:151 HAL: global heap heap status
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user report_heapstatus:153 HAL:   arena=787136 totail_avail=262032 fragments=1 largest=262032
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user report_heapstatus:157 HAL:   requested=524853 allocated=525000 freed=128 waste=0%
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user report_memory_usage:168 HAL:   strings on global heap: alloc=373 freed=92 balance=281
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user report_memory_usage:175 HAL:   RT objects: 464  alignment loss: 7  (1%)
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user report_memory_usage:177 HAL:   hal_malloc():   1
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user report_memory_usage:179 HAL:   unused:   261360
Mar  2 08:46:45 INTEL-i7 msgd:0: hal_lib:9208:user halg_exit:320 HAL: hal_sweep: 2 objects freed

close to failure

Mar  2 09:01:51 INTEL-i7 msgd:0: ulapi:18481:user _ulapi_init(): ulapi rt-preempt unknown loaded
Mar  2 09:01:51 INTEL-i7 msgd:0: ulapi:18481:user halg_xinitfv:271 HAL: singleton component 'hal_lib18481' id=32762 initialized
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user --halcmd show pin db.funct.time
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user halg_exit:293 HAL: removing component 32764 'halcmd18481'
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32762
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user halg_exit:293 HAL: removing component 32762 'hal_lib18481'
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user halg_exit:315 HAL: hal_errorcount()=0
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user halg_exit:316 HAL: _halerrno=0
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user report_heapstatus:151 HAL: HAL heap heap status
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user report_heapstatus:153 HAL:   arena=262144 totail_avail=260112 fragments=1 largest=260112
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user report_heapstatus:157 HAL:   requested=1569872 allocated=1569872 freed=1568064 waste=0%
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user report_heapstatus:151 HAL: global heap heap status
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user report_heapstatus:153 HAL:   arena=787136 totail_avail=262032 fragments=1 largest=262032
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user report_heapstatus:157 HAL:   requested=724640 allocated=786216 freed=261344 waste=7%
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user report_memory_usage:168 HAL:   strings on global heap: alloc=200160 freed=199877 balance=283
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user report_memory_usage:175 HAL:   RT objects: 464  alignment loss: 7  (1%)
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user report_memory_usage:177 HAL:   hal_malloc():   1
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user report_memory_usage:179 HAL:   unused:   261360
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18481:user halg_exit:320 HAL: hal_sweep: 2 objects freed
Mar  2 09:01:51 INTEL-i7 rtapi:0: 4:rtapi_app:9183:user pid=9183 flavor=rt-preempt gcc=4.9.2 git=unknown
Mar  2 09:01:51 INTEL-i7 rtapi:0: 4:rtapi_app:9183:user pid=9183 flavor=rt-preempt gcc=4.9.2 git=unknown
Mar  2 09:01:51 INTEL-i7 msgd:0: ulapi:18486:user _ulapi_init(): ulapi rt-preempt unknown loaded
Mar  2 09:01:51 INTEL-i7 msgd:0: ulapi:18486:user halg_xinitfv:271 HAL: singleton component 'hal_lib18486' id=32766 initialized
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32766
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user halg_exit:293 HAL: removing component 32766 'hal_lib18486'
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user halg_exit:315 HAL: hal_errorcount()=0
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user halg_exit:316 HAL: _halerrno=0
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user report_heapstatus:151 HAL: HAL heap heap status
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user report_heapstatus:153 HAL:   arena=262144 totail_avail=260112 fragments=1 largest=260112
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user report_heapstatus:157 HAL:   requested=1570064 allocated=1570064 freed=1568256 waste=0%
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user report_heapstatus:151 HAL: global heap heap status
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user report_heapstatus:153 HAL:   arena=787136 totail_avail=262032 fragments=1 largest=262032
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user report_heapstatus:157 HAL:   requested=724665 allocated=786248 freed=261376 waste=7%
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user report_memory_usage:168 HAL:   strings on global heap: alloc=200185 freed=199902 balance=283
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user report_memory_usage:175 HAL:   RT objects: 464  alignment loss: 7  (1%)
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user report_memory_usage:177 HAL:   hal_malloc():   1
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user report_memory_usage:179 HAL:   unused:   261360
Mar  2 09:01:51 INTEL-i7 msgd:0: hal_lib:18486:user halg_exit:320 HAL: hal_sweep: 1 objects freed

error from here
ArcEye commented 7 years ago

Are we sure this is an issue with halcmd, or do other programs that access HAL and RTAPI see similar behavior? Can someone test a python script that just does something simple (like show a pin value) and see if it also leaks memory? I'd test myself, but I don't have a test system handy and given my lack of Python-foo, this is probably an hour task for me (and it seems like it ought to be 5-10 minutes).

Amen to the python thoughts, it always takes me 10 times longer to figure out how to do stuff in python.

I will try to cobble together a test which reads a pin and prints it at a programmatic level.

ArcEye commented 7 years ago

I finally had the time to write something.

Just run a program testmem via halcmd loadusr -W testmem 10000 that does 10,000 iterations of halpr_find_pin_by_name(), gets the value and prints the result. Ran to the end without any issues.

Tomorrow I will run it in a script that launches inside a 10K loop via halcmd loadusr -W testmem 1 I anticipate that it is the loading and unloading of halcmd and hal_lib that will give the problems and it will fail long before the loop completes.

ArcEye commented 7 years ago

Tomorrow I will run it in a script that launches inside a 10K loop via halcmd loadusr -W testmem 1 I anticipate that it is the loading and unloading of halcmd and hal_lib that will give the problems and it will fail long before the loop completes.

Which is exactly what happened, failed after only about 2 minutes

The failure is looking familiar, 6% 'reported' waste in the global heap status and it all starts going to pot.

Mar  3 09:03:35 INTEL-i7 msgd:0: ulapi:23173:user _ulapi_init(): ulapi rt-preempt unknown loaded
Mar  3 09:03:35 INTEL-i7 msgd:0: ulapi:23173:user halg_xinitfv:271 HAL: singleton component 'hal_lib23173' id=32581 initialized
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user --halcmd loadusr -W testmem 1
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user testmem
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user 1
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user
Mar  3 09:03:35 INTEL-i7 msgd:0: ulapi:23176:user _ulapi_init(): ulapi rt-preempt unknown loaded
Mar  3 09:03:35 INTEL-i7 msgd:0: ulapi:23176:user halg_xinitfv:271 HAL: singleton component 'hal_lib23176' id=32585 initialized
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user halg_xinitfv:199 HAL error: duplicate component name 'testmem'
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32585
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user halg_exit:293 HAL: removing component 32585 'hal_lib23176'
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user halg_exit:315 HAL: hal_errorcount()=2
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user halg_exit:316 HAL: _halerrno=-16
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user report_heapstatus:151 HAL: HAL heap heap status
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user report_heapstatus:153 HAL:   arena=262144 totail_avail=259904 fragments=2 largest=259800
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user report_heapstatus:157 HAL:   requested=1338704 allocated=1338704 freed=1336704 waste=0%
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user report_heapstatus:151 HAL: global heap heap status
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user report_heapstatus:153 HAL:   arena=787136 totail_avail=261992 fragments=2 largest=261968
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user report_heapstatus:157 HAL:   requested=697097 allocated=747680 freed=222784 waste=6%
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user report_memory_usage:168 HAL:   strings on global heap: alloc=172617 freed=172313 balance=304
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user report_memory_usage:175 HAL:   RT objects: 464  alignment loss: 7  (1%)
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user report_memory_usage:177 HAL:   hal_malloc():   1
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user report_memory_usage:179 HAL:   unused:   261360
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23176:user halg_exit:320 HAL: hal_sweep: 1 objects freed
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user halg_exit:293 HAL: removing component 32583 'halcmd23173'
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32581
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user halg_exit:293 HAL: removing component 32581 'hal_lib23173'
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user halg_exit:315 HAL: hal_errorcount()=0
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user halg_exit:316 HAL: _halerrno=0
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user report_heapstatus:151 HAL: HAL heap heap status
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user report_heapstatus:153 HAL:   arena=262144 totail_avail=260008 fragments=2 largest=259904
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user report_heapstatus:157 HAL:   requested=1338704 allocated=1338704 freed=1336800 waste=0%
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user report_heapstatus:151 HAL: global heap heap status
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user report_heapstatus:153 HAL:   arena=787136 totail_avail=262016 fragments=2 largest=261992
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user report_heapstatus:157 HAL:   requested=697097 allocated=747680 freed=222800 waste=6%
Mar  3 09:03:35 INTEL-i7 rtapi:0: 4:rtapi_app:4583:user pid=4583 flavor=rt-preempt gcc=4.9.2 git=unknown
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user report_memory_usage:168 HAL:   strings on global heap: alloc=172617 freed=172326 balance=291
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user report_memory_usage:175 HAL:   RT objects: 464  alignment loss: 7  (1%)
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user report_memory_usage:177 HAL:   hal_malloc():   1
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user report_memory_usage:179 HAL:   unused:   261360
Mar  3 09:03:35 INTEL-i7 msgd:0: hal_lib:23173:user halg_exit:320 HAL: hal_sweep: 2 objects freed
ArcEye commented 7 years ago

I had nothing to do with the new memory routines in multicore and know little about their operation, but I am speculating towards a problem with memory allocation / retrieval. If memory is being aligned for atomic operations, is that causing wastage at the boundaries which is just being orphaned?

A completely unscientific test but has some persuasive results.

The standard duration for running watch -n0.1 halcmd show pin db.funct.time before it runs out of memory on my system is about 4 mins.

I hacked https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal_memory.c#L115 and https://github.com/machinekit/machinekit/blob/master/src/rtapi/rtapi_heap.c#L73 to force 2 byte alignment irrespective of what was requested (the max and possibly default is 8 byte)

Ran the test again and it lasted 16 mins before running out of memory

AFAIK the default system memory address allocation starts at a multiple on 32bit of 8 byte and on 64bit of 16 byte. So why there are 2, 4 and 8 byte options and no 16 byte option for the allocation within local (c-m)alloc code I don't know.

Think I am running out of ideas, it is a situation unlikely to arise often (loading halcmd 2750 times in 4 mins before it fails, that is).

machinekoder commented 7 years ago

when i tried to compile the watchdog.icomp on a machinekit-xenomai package install I encountered the following problem:

sudo instcomp -i watchdog.icomp 
Compiling realtime watchdog.c
In file included from /usr/include/linuxcnc/hal_accessor.h:6:0,
                 from /usr/include/linuxcnc/hal_priv.h:677,
                 from watchdog.c:11:
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_load_u64’:
/usr/include/linuxcnc/rtapi_atomics.h:153:31: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h:153:31: note: each undeclared identifier is reported only once for each function it appears in
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_load_s64’:
/usr/include/linuxcnc/rtapi_atomics.h:160:31: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_store_u64’:
/usr/include/linuxcnc/rtapi_atomics.h:168:36: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_store_s64’:
/usr/include/linuxcnc/rtapi_atomics.h:173:36: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_inc_u64’:
/usr/include/linuxcnc/rtapi_atomics.h:180:35: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_cas_u64’:
/usr/include/linuxcnc/rtapi_atomics.h:188:6: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_load_u8’:
/usr/include/linuxcnc/rtapi_atomics.h:198:31: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_load_u32’:
/usr/include/linuxcnc/rtapi_atomics.h:205:31: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_load_s32’:
/usr/include/linuxcnc/rtapi_atomics.h:212:31: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_load_ptr’:
/usr/include/linuxcnc/rtapi_atomics.h:219:50: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_store_u8’:
/usr/include/linuxcnc/rtapi_atomics.h:225:36: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_store_u32’:
/usr/include/linuxcnc/rtapi_atomics.h:230:36: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_store_s32’:
/usr/include/linuxcnc/rtapi_atomics.h:235:36: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_store_ptr’:
/usr/include/linuxcnc/rtapi_atomics.h:240:55: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_add_s32’:
/usr/include/linuxcnc/rtapi_atomics.h:246:40: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_add_u32’:
/usr/include/linuxcnc/rtapi_atomics.h:251:40: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_cas_u8’:
/usr/include/linuxcnc/rtapi_atomics.h:257:6: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_cas_u32’:
/usr/include/linuxcnc/rtapi_atomics.h:263:6: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/rtapi_atomics.h: In function ‘rtapi_cas_s32’:
/usr/include/linuxcnc/rtapi_atomics.h:269:6: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
In file included from /usr/include/linuxcnc/hal_priv.h:677:0,
                 from watchdog.c:11:
/usr/include/linuxcnc/hal_accessor.h: In function ‘_set_bit_pin’:
/usr/include/linuxcnc/hal_accessor.h:175:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_set_s32_pin’:
/usr/include/linuxcnc/hal_accessor.h:176:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_set_u32_pin’:
/usr/include/linuxcnc/hal_accessor.h:177:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_set_u64_pin’:
/usr/include/linuxcnc/hal_accessor.h:178:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_set_s64_pin’:
/usr/include/linuxcnc/hal_accessor.h:179:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_set_float_pin’:
/usr/include/linuxcnc/hal_accessor.h:180:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_get_bit_pin’:
/usr/include/linuxcnc/hal_accessor.h:198:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_get_s32_pin’:
/usr/include/linuxcnc/hal_accessor.h:199:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_get_u32_pin’:
/usr/include/linuxcnc/hal_accessor.h:200:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_get_u64_pin’:
/usr/include/linuxcnc/hal_accessor.h:201:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_get_s64_pin’:
/usr/include/linuxcnc/hal_accessor.h:202:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_get_float_pin’:
/usr/include/linuxcnc/hal_accessor.h:203:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘incr_s32_pin’:
/usr/include/linuxcnc/hal_accessor.h:232:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_incr_s32_pin’:
/usr/include/linuxcnc/hal_accessor.h:232:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘incr_u32_pin’:
/usr/include/linuxcnc/hal_accessor.h:233:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_incr_u32_pin’:
/usr/include/linuxcnc/hal_accessor.h:233:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_get_bit_sig’:
/usr/include/linuxcnc/hal_accessor.h:253:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_get_s32_sig’:
/usr/include/linuxcnc/hal_accessor.h:254:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_get_u32_sig’:
/usr/include/linuxcnc/hal_accessor.h:255:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_get_u64_sig’:
/usr/include/linuxcnc/hal_accessor.h:256:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_get_s64_sig’:
/usr/include/linuxcnc/hal_accessor.h:257:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_get_float_sig’:
/usr/include/linuxcnc/hal_accessor.h:258:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_set_bit_sig’:
/usr/include/linuxcnc/hal_accessor.h:281:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_set_s32_sig’:
/usr/include/linuxcnc/hal_accessor.h:282:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_set_u32_sig’:
/usr/include/linuxcnc/hal_accessor.h:283:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_set_u64_sig’:
/usr/include/linuxcnc/hal_accessor.h:284:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_set_s64_sig’:
/usr/include/linuxcnc/hal_accessor.h:285:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
/usr/include/linuxcnc/hal_accessor.h: In function ‘_set_float_sig’:
/usr/include/linuxcnc/hal_accessor.h:286:1: error: ‘RTAPI_MEMORY_MODEL’ undeclared (first use in this function)
make: *** [watchdog.o] Error 1

any ideas?

ArcEye commented 7 years ago

It is probably to do with this https://github.com/machinekit/machinekit/blob/master/src/rtapi/rtapi_bitops.h#L35

Are you using a compiler that doesn't properly support C11 ? What platform exactly are you building on?

You appear to have some strange mix of flags, where RTAPI_MEMORY_MODEL has not been set but you still meet the criteria for using __ATOMIC_SEQ_CST to enforce total ordering.

Could probably be avoided by checking #ifdef RTAPI_MEMORY_MODEL before all functions that use it, albeit it would have to be at an early level so there is an alternative path if not defined.

machinekoder commented 7 years ago

@ArcEye This happens on BBB with Debian Wheezy. So maybe the compiler might not have c++11 enabled and instcomp is not activating it?

ArcEye commented 7 years ago

It is nothing to do with instcomp, it is the built in support for atomics in the compiler What version are you using?

You need 4.7.2 really as a minimum and preferably 4.9.2 https://gcc.gnu.org/wiki/C11Status

I suspect this means that anything below 4.7 will not get RTAPI_USE_ATOMIC set https://github.com/machinekit/machinekit/blob/master/src/rtapi/rtapi_bitops.h#L40

You have obviously 'fallen between two stools', with RTAPI_USE_ATOMIC not being defined and thus RTAPI_MEMORY_MODEL not set to __ATOMIC_SEQ_CST in https://github.com/machinekit/machinekit/blob/master/src/rtapi/rtapi_bitops.h#L77 yet hal_accessor.h is using RTAPI_MEMORY_MODEL, whether you have HAVE_CK defined https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal_accessor.h#L54 or not https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal_accessor.h#L122

You could try making #define RTAPI_MEMORY_MODEL __ATOMIC_SEQ_CST in https://github.com/machinekit/machinekit/blob/master/src/rtapi/rtapi_bitops.h#L77 an unconditional define.

Either it will build, or your compiler will barf, because it does not have inbuilt support for __ATOMIC_SEQ_CST, but it is defined by default to 5 https://github.com/machinekit/machinekit/blob/master/src/rtapi/rtapi_bitops.h#L37 so I would expect it might work

machinekoder commented 7 years ago

Thank you, I do not necessarily need to build on this machine anyway. Just wanted to clarify if this problem is already known.

machinekoder commented 7 years ago

I still have to track this down, but I noticed some problems with Haltalk today. The problems sound similar to a QtQuickVcp issue reported today: https://github.com/qtquickvcp/QtQuickVcp/issues/151

ArcEye commented 7 years ago

Thank you, I do not necessarily need to build on this machine anyway. Just wanted to clarify if this problem is already known.

It came up when Charles was trying to build on ARM with gcc 4.6 also https://github.com/machinekit/machinekit-multicore/issues/10#issuecomment-275948245

ArcEye commented 7 years ago

I still have to track this down, but I noticed some problems with Haltalk today. The problems sound similar to a QtQuickVcp issue reported today: qtquickvcp/QtQuickVcp#151

I don't use haltalk, but I did borrow a sim from ArisRobo to test QtVcp / haltalk etc. and it all worked fine at time of testing https://github.com/machinekit/machinekit/tree/master/configs/sim/qqvsim

However I built my own QtVCP / Cetus / Machineface etc so didn't use your binaries, which is one difference.

machinekoder commented 7 years ago

@ArcEye Did you test it with an ARM Debian Wheezy installation? This seems to be the common denominator here.

ArcEye commented 7 years ago

No, I don't have anything ARM that uses Wheezy. That is certainly what @cdsteinkuehler had problems with

My main ARM use is the DE0-NANO-Soc and that is Jessie based with gcc-4.9.2 and works without any problems.

Now that Robert Nelsons images for BBB are based upon Jessie too, there is no real need for anyone else to use Wheezy either. Wheezy is already only receiving security fixes and next year will be dropped completely by Debian.

einstine909 commented 7 years ago

@machinekoder Do you want me to image my BBB to Jessie to see if that fixes what I reported over on QtQuickVcp?

machinekoder commented 7 years ago

@einstine909 No, that does not fix the problem. However, if you have time and a spare SD card you can try it of course.

machinekoder commented 7 years ago

@ArcEye qqvsim does work here. The problems seems only be happening in some cases. Maybe a HAL component with a memory leak?

cdsteinkuehler commented 7 years ago

On 3/10/2017 11:25 AM, ArcEye wrote:

That is certainly what @cdsteinkuehler https://github.com/cdsteinkuehler had problems with

IIRC, the problems I was having related to using the wrong compiler (4.6 instead of 4.7 or newer) and not having run "make clean", but I didn't keep notes.

If you're compiler version is OK, do a make clean and re-run autogen.sh and configure and see if the issue persists.

-- Charles Steinkuehler charles@steinkuehler.net

machinekoder commented 7 years ago

@cdsteinkuehler I'm experiencing the issues with a package install.

I found this problem to happen when I increase the number of connected remote components in the configurations. At some point haltalk (debug with haltalk -d 1 seems to delay answers to MT_PING message too long and the timeouts start to occur. I will try to reproduce something similar with qqvsim.