solettaproject / soletta

Soletta Project is a framework for making IoT devices. With Soletta Project's libraries developers can easily write software for devices that control actuators/sensors and communicate using standard technologies. It enables adding smartness even on the smallest edge devices.
http://solettaproject.org
Apache License 2.0
225 stars 108 forks source link

Intermittently failed to run fbp at the 1st trial on IIO sensors with "create" command in json file #2072

Closed fulong82 closed 8 years ago

fulong82 commented 8 years ago

Issue Description

Intermittently observed a failed running error at the 1st try but error gone while doing a 2nd try running with IIO Sensor and using "create" command to create device and load the sensor driver.

The failed warning/error may happen in a random location for example: case 1: WRN: ./src/lib/io/sol-iio.c:916 sol_iio_add_channel() Could not find channel [in_magn_x] for device0 case 2: WRN: ./src/lib/io/sol-iio.c:252 set_current_trigger() Could not write to device current_trigger file

Suspect this is a timing issue where the sol flow node is trying to access the iio sysfs interface while iio framework is still creating the sensor interfaces.

Sample json & fbp code

====sol-flow.json==== { "$schema": "http://solettaproject.github.io/soletta/schemas/config.schema", "nodetypes": [ { "name": "MyGyro", "type": "iio/gyroscope", "options": { "iio_device": "l3gd20 create,i2c,pci0000:00/0000:00:17.2/i2c_designware.6,0x6B,l3gd20 0", "buffer_size": 0 } }, { "name": "MyMag", "type": "iio/magnet", "options": { "iio_device": "ak09911 create,i2c,pci0000:00/0000:00:17.2/i2c_designware.6,0x0C,ak09911 0", "buffer_size": -1 } } ] }

====testGyro.fbp====

!/usr/bin/env sol-fbp-runner

gyro(MyGyro) timer(timer:interval=1000)

timer OUT -> TICK gyro OUT -> IN _(console)

====testMagneto.fbp====

!/usr/bin/env sol-fbp-runner

magneto(MyMag) timer(timer:interval=1000)

timer OUT -> TICK magneto OUT -> IN _(console)

Steps to reproduce

Take the log 1 below as an reproduce steps example:

  1. Cold boot the system with no iio sensor device & driver loaded
  2. Run the fbp with the "create" command in json config file to create sensor device and load the sensor driver (take gyro sensor as an example)
  3. Run another fbp with the "create" command in json config file to create sensor device and load the sensor driver (take magneto sensor as an example)
  4. Error observed: Failed to run flow
  5. Run the same fbp again (take magneto sensor as an example) and no failing observed

    Logs

    • Soletta version (top commit hash): tag v1_beta19
    • Soletta log 1: Log 1
    • Soletta log 2 (SOL_LOG_LEVEL = 5): Log 2
    • Soletta log 3 (SOL_LOG_LEVEL = 5): Log 3
fulong82 commented 8 years ago

Hi @lblim, @yongli3 & @barbieri,

This is another issue i faced during my testing of soletta on our APL-i Yocto Linux platform.

Thanks & Regards, Wilson

barbieri commented 8 years ago

For some problem in sol_iio_address_device(), the id is being always "0" and thus it finds the incorrect device. It is a bug on our side, but the provided sol-flow.json is also incorrect since all the iio_device ends with "0", that would be the IIO device to use. Maybe if you use "1" in there for the second module you load, it would "work".

As I said, it's a bug in our code, likely it wasn't tested with 2 IIO at the same time and resorting to "0" was always hitting the correct path.

Since the module should be already loaded, the problem seems to be in sol_iio_address_device() -> resolve_device_address() -> resolve_name_path() -> resolve_name_path_cb(). It should be reading /sys/bus/iio/devices/iio:device${ID}/name and if that matches the expected driver (ie: l3gd20), then it should extract ${ID} from the path and stop iterating the directory, then returning that number. The failed log says it's failing:

DBG: ./src/lib/io/sol-iio.c:1371 sol_iio_address_device() IIO device creation/resolving dispatching command: l3gd20
DBG: ./src/lib/io/sol-iio.c:140 craft_filename_path() available=4096, used=37, path='/sys/bus/iio/devices/iio:device0/name'
DBG: ./src/lib/io/sol-iio.c:1371 sol_iio_address_device() IIO device creation/resolving dispatching command: create,i2c,pci0000:00/0000:00:17.2/i2c_designware.6,0x6B,l3gd20
DBG: ./src/lib/io/sol-iio.c:1371 sol_iio_address_device() IIO device creation/resolving dispatching command: 0

That is, it tried to search for /sys/bus/iio/devices/iio:device0/name, it wasn't l3gd20, then it tried to create I2C pci0000:00/0000:00:17.2/i2c_designware.6,0x6B,l3gd20 and it failed (because was already loaded), then it uses the IIO device ID given in the sol-flow.json as last parameter: "0".

It may be worth to investigate https://github.com/solettaproject/soletta/blob/master/src/lib/io/sol-iio.c#L1155 to see what is happening there. I don't have the hardware with me, but those that have can check it out with a simple gdb breakbpoint and see what is happening with that loop.

@yongli3 can you check this while @edersondisouza is out?

lblim commented 8 years ago

@bdilly May I know when will be the target release date of v1? Please keep me posted as we will need to update our BSP to pull in v1.

lblim commented 8 years ago

@barbieri @bdilly whether this will be v1 or v2?

cabelitos commented 8 years ago

2229

cabelitos commented 8 years ago

fixed and merged

fulong82 commented 8 years ago

Hi @barbieri, @edersondisouza, @cabelitos and @bdilly

I have tried to build with latest commit (3cb8f47703975d1fafb89a57f5184fdf7bf5842b) and tried with the reproduce steps above and i still see the same issue:

root@intel-corei7-64-cavs-hda:/opt/soletta-dev-app/repos/singlesession/testfbp# ./runfbp.sh testWebGyro.fbp sol-flow2.json
[  344.200001] i2c i2c-6: new_device: Instantiated device l3gd20 at 0x6b
[  344.338517] iio iio:device0: registered gyroscope l3gd20
WRN: ./src/lib/io/sol-iio.c:559 iio_set_channel_scale() Could not set scale to 0.000000 on channel [in_anglvel_x] of device0
WRN: ./src/lib/io/sol-iio.c:559 iio_set_channel_scale() Could not set scale to 0.000000 on channel [in_anglvel_y] of device0
WRN: ./src/lib/io/sol-iio.c:559 iio_set_channel_scale() Could not set scale to 0.000000 on channel [in_anglvel_z] of device0
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)
#anon:6:34 (0.000000, 0.000000, -0.000000) (direction-vector)

root@intel-corei7-64-cavs-hda:/opt/soletta-dev-app/repos/singlesession/testfbp# ./runfbp.sh testWebMagneto.fbp sol-flow2.json
[  381.020034] i2c i2c-6: new_device: Instantiated device ak09911 at 0x0c
WRN: ./src/lib/io/sol-iio.c:930 sol_iio_add_channel() Could not find channel [in_magn_x] for device0
WRN:sol-flow-iio ./src/modules/flow/iio/nodes.c:308 magnet_create_channels() mdata->channel_x == NULL
WRN:sol-flow-iio ./src/modules/flow/iio/nodes.c:322 magnet_create_channels() Could not create iio/magnet node. Failed to open IIO device 0
WRN:sol-flow ./src/lib/flow/sol-flow.c:128 sol_flow_node_init() failed to create node of type=0x7fe40e336c40: Invalid argument
WRN:sol-flow ./src/lib/flow/sol-flow-static.c:463 flow_node_open() failed to init node #0, type=0x7fe40e336c40, opts=0x562bcfcd1450: Invalid argument
WRN:sol-flow ./src/lib/flow/sol-flow.c:128 sol_flow_node_init() failed to create node of type=0x562bcfcd1750: Invalid argument
Error: Failed to run flow

root@intel-corei7-64-cavs-hda:/opt/soletta-dev-app/repos/singlesession/testfbp# ./runfbp.sh testWebMagneto.fbp sol-flow2.json
WRN: ./src/lib/io/sol-iio.c:559 iio_set_channel_scale() Could not set scale to 0.000000 on channel [in_magn_x] of device1
WRN: ./src/lib/io/sol-iio.c:559 iio_set_channel_scale() Could not set scale to 0.000000 on channel [in_magn_y] of device1
WRN: ./src/lib/io/sol-iio.c:559 iio_set_channel_scale() Could not set scale to 0.000000 on channel [in_magn_z] of device1
#anon:6:37 (-0.000000, -0.000000, 0.000000) (direction-vector)
#anon:6:37 (-0.000000, -0.000000, 0.000000) (direction-vector)
#anon:6:37 (-0.000000, -0.000000, 0.000000) (direction-vector)
#anon:6:37 (-0.000000, -0.000000, 0.000000) (direction-vector)
#anon:6:37 (-0.000000, -0.000000, 0.000000) (direction-vector)
#anon:6:37 (-0.000000, -0.000000, 0.000000) (direction-vector)
#anon:6:37 (-0.000000, -0.000000, 0.000000) (direction-vector)
^C
root@intel-corei7-64-cavs-hda:/opt/soletta-dev-app/repos/singlesession/testfbp#

Regards, Wilson

edersondisouza commented 8 years ago

Hi @fulong82,

Could you please share a tree or ls -R output from directory /sys/bus/iio/devices after first try to run testWebMagneto.fbp? Also, are you sure that on a newly booted device, testWebMagneto.fbp runs smoothly? Issue that @cabelitos found out would appear independently of running any other fbp.

cabelitos commented 8 years ago

Hi @fulong82 Could you also provide more detailed logs, please? With SOL_LOG_LEVEL="DBG"

fulong82 commented 8 years ago

Hi @edersondisouza & @cabelitos,

Sorry, i do not have my platform with me today and cannot recapture the new log according to your request. May do it tomorrow. But you could refer back to my 1st post Log files (let take Log3 which has SOL_LOG_LEVEL=5 for analyzing).

My observation to the issue happened in this manner: 1.) The json iio node type must fill the iio_device field in this manner: (1) sensor name, (2) create (3) iio device id for example:

"iio_device": "l3gd20 create,i2c,pci0000:00/0000:00:17.2/i2c_designware.6,0x6B,l3gd20 0",
"iio_device": "ak09911 create,i2c,pci0000:00/0000:00:17.2/i2c_designware.6,0x0C,ak09911 0",

2.) Run on clean system without any iio device being created 3.) Get the 1st iio device being created through the 1st run on 1st fbp file (for example gyro fbp) 4.) Get a 2nd iio device (this must be a different sensor type, same type will not get the issue) being created by running 2nd fbp file (for example magneto fbp) 5.) Then you will see the issue. 6.) Run the 2nd fbp again and the issue gone.

By analyzing the Log3, i believe the root cause of the issue is some code has run the "create" command just before the sol_iio_address_device() do and race condition happened. You could observer it through Log3 file Line 29 - Line 36 and Line 108 - Line 117 where they need to go through 3 resolving dispatching command and when the 2nd run, one resolving dispatching command settle: Line 164.

edersondisouza commented 8 years ago

Hi @fulong82,

Yes, we guess that log won't be different, but it's always good to have an updated one. But the tree or ls -R should provide new insights. We also would like to ensure that magnet device has nothing special on itself, that's why we would like to know if it works properly if it is first device created. Regarding point 4), that's why using '0' as last instruction is bad: it will try to use a device that certainly is not of the expected type. Of course, main bug here is Soletta failing to realise that device was created properly (second instruction).

cabelitos commented 8 years ago

Hi @fulong82,

I created a test branch so that gives us more information about what's happening. Could you, please, run your examples with this debug branch and provide us the output (with SOL_LOG_LEVEL="DBG") ?

If possible it would be nice to run the sol-fbp-runner with strace as well, however I don't know if strace is available at your environment. If it's available, please run the examples with it. Remember to run the example using the following commands:

strace -ff -oPathToStraceOutputFile sol-fbp-runner ArgsToSol-Fbp-Runner

The debug branch: https://github.com/cabelitos/soletta/tree/iio_debug

fulong82 commented 8 years ago

Hi @edersondisouza,

Here I captured the log that you requested. 1.) Perform ls -R before running the fbp: Log 2.) If I run the Magneto fbp 1st, the issue is intermittent. The 1st post Log 1 Line 75 has showed the failed case. Here i captured a succeeded case Log.

Regards, Wilson

fulong82 commented 8 years ago

Hi @cabelitos,

Here is the log files you requested: 1.) Log with SOL_LOG_LEVEL="DBG": Log 2.) Strace output (only captured on the failed run and 2 files generated): strace_log

Regards, Wilson

cabelitos commented 8 years ago

Hey @fulong82,

I merged a new fix for your problem. It's already in our master branch. Can you test it again please?

This commit 1b3a9c0 introduces the fix.

Thank you

fulong82 commented 8 years ago

Hi @cabelitos,

I have tested and it is working now. Thx.

Regards, Wilson