zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
9.91k stars 6.1k forks source link

New timeout API #17155

Closed andyross closed 4 years ago

andyross commented 4 years ago

This is a proof of concept (right now it builds and runs tests/kernel/common on native_posix only) of a new timeout API that allows us to get beyond the millisecond world. As of #16782, it's possible to push the kernel timing resolution essentially arbitrarily close to the hardware capabilities. But we still have two problems:

  1. The only API that lets an app actually achieve that precision is k_usleep(). All the other timeout parameters in the kernel (e.g. the delay for k_thread_create(), timeouts for k_sem_get() and other IPC primitives, etc...) are still specified in milliseconds.

  2. With a 32 bit timeout, we can't go much farther than 32 kHz anyway. At that frequency, the clock will roll over in ~2.5 days, which is dangerously close to values that real world applications are likely to see.

So what this series does is change all those timeout parameters into an opaque "k_timeout_t" type. This is still configurable to be a 32 bit millisecond count for compatibility (obviously at the expense of new features) so we don't break any promises for a version bump. The new API allows setting that timeout in any of "ticks", "cycles", "milliseconds" or "microseconds" ("nanoseconds" would be easily doable in the same framework, but frankly we don't have hardware that can exercise that yet) as the user desires, with appropriate rounding happening internally so the user doesn't need to muck with it.

Along the way, it introduces a whole new time unit conversion API, with 48 (!!) optimized conversion functions generated automatically for all (useful) combinations of unit, rounding and precision.

Then finally it shows off the new framework by adding a whole new timeout class: "uptime" timeouts, where instead of telling the kernel how long you want to wait, you tell it the moment you want to wake up (in your choice of units). The kernel handles the delta computation for you atomically. And it works in like three lines of code.

Obviously this isn't done yet. In addition to the need for documentation of the new APIs, fleshing out of missing bits and the inevitable parade of undiscovered new bugs, as implemented this is currently incompatible with userspace. The new k_timeout_t argument may be a 64 bit value, and our existing syscall implementation only handles arguments which fit in machine registers. This isn't hard to fix the tedious way (by splitting off a separate set of system calls to back each one of the timeout-taking functions), but ideally we should be able to make this work in the syscall generator. Either way, it's a decent chunk of work and none of it has been done.

zephyrbot commented 4 years ago

Found the following issues, please fix and resubmit:

License issues

In most cases you do not need to do anything here, especially if the files reported below are going into ext/ and if license was approved for inclusion into ext/ already. Fix any missing license/copyright issues. The license exception if a JFYI for the maintainers and can be overriden when merging the pull request.

Documentation issues

/var/lib/shippable/build/IN/main_repo/zephyr/include/sys_clock.h:67: warning: documentation for unknown define K_TIMEOUT_UPTIME_TICKS found.

/var/lib/shippable/build/IN/main_repo/zephyr/include/sys_clock.h:76: warning: documentation for unknown define K_TIMEOUT_UPTIME_MS found.

/var/lib/shippable/build/IN/main_repo/zephyr/include/sys_clock.h:85: warning: documentation for unknown define K_TIMEOUT_UPTIME_US found.

/var/lib/shippable/build/IN/main_repo/zephyr/include/sys_clock.h:94: warning: documentation for unknown define K_TIMEOUT_UPTIME_CYC found.

/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2965: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_end_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2975: warning: The following parameters of k_delayed_work_end_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2984: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_remaining_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2995: warning: The following parameters of k_delayed_work_remaining_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2965: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_end_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2975: warning: The following parameters of k_delayed_work_end_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2984: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_remaining_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2995: warning: The following parameters of k_delayed_work_remaining_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2965: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_end_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2975: warning: The following parameters of k_delayed_work_end_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2984: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_remaining_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2995: warning: The following parameters of k_delayed_work_remaining_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2965: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_end_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2975: warning: The following parameters of k_delayed_work_end_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2984: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_remaining_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2995: warning: The following parameters of k_delayed_work_remaining_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
andyross commented 4 years ago

Just to note: it's not as big as it looks. About 700 lines are the auto-generated conversion utilities in time_units.h. And all the API churn is pretty much just boilerplate. Look at sys_clock.h and the first 10% of time_units.h for the meat of the PR.

Also note that almost nothing was changed (just the change in type of k_timeout_t and the few lines added for absolute timeouts) in the guts of the timeout implementation. This is all happening at the API level.

andyross commented 4 years ago

Also also note the perl script I snuck into the comments. :)

pabigot commented 4 years ago

Thanks for putting this together. In short I think the technical approach of k_timeout_t has great potential, but the details are premature.

I had hoped we would have an opportunity to discuss the goals and expectations before jumping straight to an implementation that codifies specific expectations, even though it is clearly marked as a prototype. I've opened #17162 for this, containing material that I drafted before looking at this PR, but refined based on what you've done here. Some concerns that should be mooted there are:

Some of this might not be achievable, and if it is some might be an independent change, but if we're going to touch the delay/timeout API we should reach consensus on what we'd like to see it support so we don't have to do it again for a while.

carlescufi commented 4 years ago

Additional requirements as per kernel meeting:

pabigot commented 4 years ago

As one of the folks most interested in absolute timeouts: I don't consider that feature to be the highest priority. As long as I can read the full-resolution tick counter, and express delays in ticks, that'd meet my desires for 2.0. If absolute timeouts adds new API, I'd rather wait until we have a chance to think carefully about it in the context of other requirements in #17162.

nordic-krch commented 4 years ago

I've tried to compile bluetooth/central_hr sample on nrf52840 with 64 bit and 32 bit. Had to do tweaks in multiple places to compile it but finally got following results:

32 bits 64bit diff
rom 124591 125231 650
ram 19448 19680 232

looks like it isn't that much.

nordic-krch commented 4 years ago

for absolute timeouts API i was thinking about adding:

I would stick to calling it absolute time rather than uptime to not close ourselves for 32 bit k_ticks_t.

What about deprecation of k_uptime_get_32 and k_uptime_get?

andyross commented 4 years ago

OK, here's the current version. All new features are present and apparently working, all tests in tests/kernel pass on all CI architectures in both new and "legacy" mode. This should be good for others to start validation and review.

But it's not done yet:

nordic-krch commented 4 years ago

@andyross didn't found time to evaluate it today. Will attack it tomorrow.

andyross commented 4 years ago

No problem. And I keep finding stuff that got missed. This morning I realized that I pushed the last batch of changes without having gone back to test 64 bit mode, which of course had been broken by a mistyped declaration (k_delayed_work_remaining_ticks was u32_t in one place where it should have been k_ticks_t), which reminded me that I'd pushed this without moving some of the tests to be 64bit and it had no coverage. :)

Will be a few days before this stabilizes I'm sure. But I'm reasonably hopeful it'll work for you without too much surgery.

andyross commented 4 years ago

Pushed a current version. All tests now pass on qemu_{x86_nommu,xtensa,riscv32,cortex_m3,nios2} and native_posix{,64}. All that's left is to fix the userspace stuff (which I'm thinking I'm going to put in a separate PR tomorrow to simplify review, as it doesn't involve anything timeout-specific) and add the tests.

This will be ready for merge very soon, please review and let me know what I missed, in particular with APIs and function naming, etc... (bugs we can deal with during stablization, but we have to be happy enough with the API before it merges -- no late breaking bikesheds please!).

andyross commented 4 years ago

Also I was expecting API to read the system (uptime) clock at full resolution without converting to milliseconds (i.e. in either cycles or ticks). I can't find it.

Good grief, yeah, it got dropped. It's folded into a test commit blob in my tree. Will get this fixed ASAP (the implementation is just a tiny wrapper).

andyross commented 4 years ago

Ugh. To probably no one's surprise, my getting this ready involved a detour into a rework of the system call handler and dispatch code generation first.

But regardless, I think this is done now. All features present. Everything rebased. Everything passes. Sorry for the delays.

It does include the needed syscall changes from #18064 at the base of this series, though. Don't review those here, do it in the original PR. But we can merge from either.

ioannisg commented 4 years ago

Moving to next release since the review was not finalized for v.2.0.0

andyross commented 4 years ago

Rebase over current syscall split implementation so this can go in as soon as we lift the gate.

nashif commented 4 years ago

@andyross please rebase

carlescufi commented 4 years ago

FYI @andyross @pizi-nordic @pabigot @nashif Marked the PR for the dev-review meeting FYI

zephyrbot commented 4 years ago

All checks are passing now.

checkpatch (informational only, not a failure)

-:1308: WARNING:LONG_LINE: line over 80 characters
#1308: FILE: include/kernel.h:2700:
+__syscall int k_stack_pop(struct k_stack *stack, stack_data_t *data, k_timeout_t timeout);

-:1477: WARNING:LONG_LINE: line over 80 characters
#1477: FILE: include/kernel.h:3866:
+                struct k_mem_block *block, k_timeout_t timeout);

-:1828: WARNING:NEW_TYPEDEFS: do not add new typedefs
#1828: FILE: include/sys_clock.h:99:
+typedef k_ticks_t k_timeout_t;

-:1840: WARNING:NEW_TYPEDEFS: do not add new typedefs
#1840: FILE: include/sys_clock.h:109:
+typedef struct { k_ticks_t ticks; } k_timeout_t;

-:1917: WARNING:LONG_LINE: line over 80 characters
#1917: FILE: include/sys_clock.h:159:
+#define Z_TIMEOUT_ABSOLUTE_MS(t) Z_TIMEOUT_ABSOLUTE_TICKS(k_ms_to_ticks_ceil64(t))

-:1932: WARNING:LONG_LINE: line over 80 characters
#1932: FILE: include/sys_clock.h:169:
+#define Z_TIMEOUT_ABSOLUTE_US(t) Z_TIMEOUT_ABSOLUTE_TICKS(k_us_to_ticks_ceil64(t))

-:1951: WARNING:LONG_LINE: line over 80 characters
#1951: FILE: include/sys_clock.h:179:
+#define Z_TIMEOUT_ABSOLUTE_CYC(t) Z_TIMEOUT_ABSOLUTE_TICKS(k_cyc_to_ticks_ceil64(t))

-:2072: WARNING:LINE_SPACING: Missing a blank line after declarations
#2072: FILE: include/time_units.h:69:
+   u64_t off = 0;
+   if (!mul_ratio) {

-:2163: WARNING:LONG_LINE_COMMENT: line over 80 characters
#2163: FILE: include/time_units.h:160:
+ *                 print "/", "* Generated.  Do not edit.  See above. *", "/\n\t";

-:3037: WARNING:LONG_LINE: line over 80 characters
#3037: FILE: kernel/mailbox.c:306:
+int k_mbox_put(struct k_mbox *mbox, struct k_mbox_msg *tx_msg, k_timeout_t timeout)

-:3239: WARNING:LONG_LINE: line over 80 characters
#3239: FILE: kernel/mutex.c:198:
+static inline int z_vrfy_k_mutex_lock(struct k_mutex *mutex, k_timeout_t timeout)

-:3313: WARNING:LONG_LINE: line over 80 characters
#3313: FILE: kernel/pipes.c:730:
+            size_t *bytes_written, size_t min_xfer, k_timeout_t timeout)

-:3322: WARNING:LONG_LINE: line over 80 characters
#3322: FILE: kernel/pipes.c:742:
+            size_t *bytes_written, size_t min_xfer, k_timeout_t timeout)

-:3893: WARNING:LONG_LINE: line over 80 characters
#3893: FILE: kernel/timeout.c:204:
+   k_ticks_t rem = z_timeout_end(timeout) - ((k_ticks_t)curr_tick + elapsed());

-:12577: WARNING:LONG_LINE: line over 80 characters
#12577: FILE: tests/kernel/workq/work_queue_api/src/main.c:79:
+   k_delayed_work_submit_to_queue(work_q, &new_work, K_TIMEOUT_MS(TIMEOUT));

-:12608: WARNING:LONG_LINE: line over 80 characters
#12608: FILE: tests/kernel/workq/work_queue_api/src/main.c:128:
+                                   &delayed_work[i], K_TIMEOUT_MS(TIMEOUT)) == 0, NULL);

- total: 0 errors, 16 warnings, 10206 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

Your patch has style problems, please review.

NOTE: Ignored message types: AVOID_EXTERNS BRACES CONFIG_EXPERIMENTAL CONST_STRUCT DATE_TIME FILE_PATH_CHANGES MINMAX NETWORKING_BLOCK_COMMENT_STYLE PRINTK_WITHOUT_KERN_LEVEL SPLIT_STRING VOLATILE

NOTE: If any of the errors are false positives, please report
      them to the maintainers.

Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages.

andyross commented 4 years ago

OK, think I have most things addressed now, and everything passing. The absolute timeout issues caught in comments are fixed (as is another issue with locking) and I added the test it should have had originally. Will reply to comments independently.

andyross commented 4 years ago

The API assumes it is possible to convert between arbitrary cycle and tick values. This is at best true only for tickless kernels.

No, this is true always. We promise there is a cycle counter returned from k_cycle_get_32() and that its period is regular and inspectable via sys_clock_hw_cycles_per_sec() (which is almost always just CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC, hpet is the weird driver because it's technically probed from hardware).

Code is allowed and encouraged to do things like spin on the cycle count to achieve high precision timing. Not allowing the timer subsystem to work in the same space seems like an unfortunate misfeature.

pabigot commented 4 years ago

The API assumes it is possible to convert between arbitrary cycle and tick values. This is at best true only for tickless kernels.

No, this is true always. We promise there is a cycle counter returned from k_cycle_get_32() and that its period is regular and inspectable via sys_clock_hw_cycles_per_sec()

The cycle counter is regular, but my recollection was that the tick counter (in non-tickless mode) would advance by 1 regardless of how much the underlying cycle counter had advanced, and that as a consequence in some situations this makes the tick counter have a varying rate relative to the cycle counter.

Another concern is that k_cycle_get_32() can wrap fast enough that naive conversion between its values and ticks will use values from different cycle "epochs" which many users will not understand or notice.

Code is allowed and encouraged to do things like spin on the cycle count to achieve high precision timing.

I'm not sure what you refer to here. Certainly using the cycle clock to measure durations is common; but where outside of test code it is used to implement a delay?

Not allowing the timer subsystem to work in the same space seems like an unfortunate misfeature.

Cycles is not a standardized time system like seconds, milliseconds, and microseconds are. I don't see where K_TIMEOUT_CYC adds value that compensates for the risk of its being mis-used.

pabigot commented 4 years ago

I've marked resolved in https://github.com/zephyrproject-rtos/zephyr/pull/17155#pullrequestreview-284322496 the comments that have been addressed to my satisfaction that don't involve another reviewer. Some others were missed; some we have a disagreement in goals that should have been discussed before implementation.

A couple that should be non-controversial:

galak commented 4 years ago

dev-review:

pabigot commented 4 years ago

https://github.com/pabigot/zephyr/tree/analyze/17155 has this PR plus a commit that replaces hello_world with an application that initializes a set of timers at 1 ms intervals, lets them all run for nominally 3 s, then stops them and measures the evenness of load, elapsed time, and callback rate.

The initial conclusions using nrf52840_pca10056 are that this does not significantly affect code size, but memory usage will increase. I am most concerned about hints of instability.

Below are some data on different configurations.

At 50 on upstream/master
   text    data     bss     dec     hex filename
  12052     672    6357   19081    4a89 build/zephyr/zephyr.elf
***** Booting Zephyr OS build zephyr-v2.0.0-543-g296f92276fd1 *****
Running with 50 timers, 32768 Hz
148851 incrs in [2975, 2979] over 98384 cycles = 3002441 us
49576 incrs/s

At 50 on pr/17155
# CONFIG_SYS_TIMEOUT_64BIT is not set
# CONFIG_SYS_TIMEOUT_LEGACY_API is not set
   text    data     bss     dec     hex filename
  11868     672    6358   18898    49d2 build/zephyr/zephyr.elf
***** Booting Zephyr OS build zephyr-v2.0.0-525-gb680b1e66773 *****
Running with 50 timers, 32768 Hz
146260 incrs in [2917, 2933] over 98364 cycles = 3001831 us
48723 incrs/s

At 50 on pr/17155
# CONFIG_SYS_TIMEOUT_64BIT is not set
CONFIG_SYS_TIMEOUT_LEGACY_API=y
   text    data     bss     dec     hex filename
  11876     672    6358   18906    49da build/zephyr/zephyr.elf
***** Booting Zephyr OS build zephyr-v2.0.0-525-gb680b1e66773 *****
Running with 50 timers, 32768 Hz
145739 incrs in [2897, 2927] over 98517 cycles = 3006500 us
48474 incrs/s

At 50 on pr/17155
CONFIG_SYS_TIMEOUT_64BIT=y
   text    data     bss     dec     hex filename
  12100     672    6982   19754    4d2a build/zephyr/zephyr.elf
***** Booting Zephyr OS build zephyr-v2.0.0-525-gb680b1e66773 *****
Running with 50 timers, 32768 Hz
336748 incrs in [5607, 6865] over 257262 cycles = 7851013 us
42892 incrs/s
andyross commented 4 years ago

New version hides the absolute timeouts behind a Z_ API. Cleaned up the new doc warnings that appeared. Fixed the stale symbol in the perl script.

@pabigot: Can you file that hang issue as a real bug if it's being seen on master? And for reference: the new code is hitting your hang earlier when in 64 bit mode? Or in both 64 and 32 bit? With 32 bit and with LEGACY enabled, we really should be seeing identical behavior.

pabigot commented 4 years ago

I've updated the test application in https://github.com/pabigot/zephyr/commits/analyze/17155 to allow finer control over a few features that affect timeout behavior. Most of the anomalies seen in the previous test were due to extreme loads with multiple timers and interleaved interrupts.

That situation also reveals some unnecessary nondeterminism with respect to timer behavior which I'd like to see addressed, but it's not new, so there's no specific issue I intend to open for presence of that behavior on master.

The change requests listed in https://github.com/zephyrproject-rtos/zephyr/pull/17155#issuecomment-533083434 have yet to be addressed.

This, rebased on current master and with #19314 cherry-picked on top, passes sanitycheck for me.

@andyross However, to pass shippable it also needs fixes for at least these problems:

git grep -E '== K_FOREVER\b'
git grep 'k_timer_start.*, 0'
andyross commented 4 years ago

Rebased again. Fixed the macro parenthisation error @pabigot pointed out (the other complaint was already addressed, it's the k_uptime_ticks() API).

andyross commented 4 years ago

Push once more, this time to include a quick fix to correct a k_sleep() argument that crossed in just-merged commit 2dfbf214100

andyross commented 4 years ago

To be clear: this passes for me with a default sanitycheck, modulo one unrelated pre-existing build failure (an unused variable warning) in tests/lib/gui/lvgl on native_posix.

pabigot commented 4 years ago

To be clear: this passes for me with a default sanitycheck, modulo one unrelated pre-existing build failure (an unused variable warning) in tests/lib/gui/lvgl on native_posix.

A full CI run locally passes except for failures that should be resolved by #19330.

However: the mass addition of CONFIG_SYS_TIMEOUT_LEGACY_API=y to a large number of samples and tests isn't going to work. It does avoid having to rework the tests; it also misses unconverted use in driver and subsystems.

For example, this branch fails to work on any application that uses ADC because there are integer parameters passed to k_timer_start() in the shared ADC implementation header drivers/adc/adc_context.h.

So I think we have to continue with patches like #19281 and #19314 to ensure that non-test code is compatible with the additions in this PR. I'll keep working that.

pabigot commented 4 years ago

@andyross Discussion in API meeting resolved to changing this PR so the default is the legacy API. Drop the overrides in all the samples and tests; modify the tests that do support the new API to opt into it for testing.

The preponderance of opinion (but not unanimous) is to switch k_sleep() (but not k_usleep() or k_busywait()) to take a timeout. That does not have to be done here as the legacy behavior works.

andyross commented 4 years ago

Apparently I didn't have an API meeting on my calendar. OK... can someone give me a list of exactly what needs to be fixed here to get it merged? Sounds like change the kconfig default for legacy mode?

Can I get a formal-ish description of what's desired for k_sleep()? Having it take a timeout seems somewhat undesirable to me, given the way these APIs work everywhere else in the world. Are we sure we don't want to add an API instead like k_timed_delay(), or just recommend the use of a k_timer?

I genuinely thought we had consensus on this last July, but I suppose none of that is outrageously difficult.

And @pabigot: can you please point me to some of the specific tests that are failing to build so I can address them here? We really shouldn't need to be mucking with arguments like that when the API is intended to be 100% compatible.

pabigot commented 4 years ago

The big one is change CONFIG_SYS_TIMEOUT_LEGACY_API to default to enabled.

I proposed k_delay() as an option; it got little to no support. The preferred solution is k_sleep(), taking either milliseconds (minority position) or a timeout so things like K_SECONDS(5) can still be used (majority position).

Per https://github.com/zephyrproject-rtos/zephyr/pull/17155#issuecomment-534539936 my latest results show that the tests do build and pass, but only because they're not testing the new API. There are failures that are in driver code, not in test code, as evidenced below:

tirzah[861]$ pwd
/mnt/nordic/zp/zephyr
tirzah[862]$ cd tests/drivers/adc/adc_api/
tirzah[863]$ git one | head -1
* c6441f469d (HEAD -> pr/17155) [Mon Sep 23 14:14:41 2019 -0700] ztest: Correct k_sleep() argument
tirzah[864]$ sed -i '/SYS_TIMEOUT_LEGACY/d' prj.conf 
tirzah[865]$ git diff | cat
diff --git a/tests/drivers/adc/adc_api/prj.conf b/tests/drivers/adc/adc_api/prj.conf
index beea391af3..c307b998b9 100644
--- a/tests/drivers/adc/adc_api/prj.conf
+++ b/tests/drivers/adc/adc_api/prj.conf
@@ -8,4 +8,3 @@ CONFIG_ADC_LOG_LEVEL_INF=y
 CONFIG_LOG_IMMEDIATE=y
 CONFIG_HEAP_MEM_POOL_SIZE=1024
 CONFIG_TEST_USERSPACE=y
-CONFIG_SYS_TIMEOUT_LEGACY_API=y
tirzah[866]$ west build -b nrf52840_pca10056
-- west build: build configuration:
       source directory: /mnt/nordic/zp/zephyr/tests/drivers/adc/adc_api
       build directory: /mnt/nordic/zp/zephyr/tests/drivers/adc/adc_api/build
       BOARD: nrf52840_pca10056 (origin: CMakeCache.txt)
-- west build: building application
[0/1] Re-running CMake...
Zephyr version: 2.0.99
-- Selected BOARD nrf52840_pca10056
-- Found west: /home/pab/.local/bin/west (found suitable version "0.6.2", minimum required is "0.6.0")
-- Loading /mnt/nordic/zp/zephyr/boards/arm/nrf52840_pca10056/nrf52840_pca10056.dts as base
-- Overlaying /mnt/nordic/zp/zephyr/dts/common/common.dts
Device tree configuration written to /mnt/nordic/zp/zephyr/tests/drivers/adc/adc_api/build/zephyr/include/generated/generated_dts_board.conf
Parsing Kconfig tree in /mnt/nordic/zp/zephyr/Kconfig
Loaded configuration '/mnt/nordic/zp/zephyr/boards/arm/nrf52840_pca10056/nrf52840_pca10056_defconfig'
Merged configuration '/mnt/nordic/zp/zephyr/tests/drivers/adc/adc_api/prj.conf'
No change to '/mnt/nordic/zp/zephyr/tests/drivers/adc/adc_api/build/zephyr/.config'
-- Cache files will be written to: /home/pab/.cache/zephyr
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/nordic/zp/zephyr/tests/drivers/adc/adc_api/build
[101/130] Building C object zephyr/drivers/adc/CMakeFiles/drivers__adc.dir/adc_nrfx_saadc.c.obj
FAILED: zephyr/drivers/adc/CMakeFiles/drivers__adc.dir/adc_nrfx_saadc.c.obj 
ccache /usr/local/zephyr-sdk/arm-zephyr-eabi/bin/arm-zephyr-eabi-gcc -DBUILD_VERSION=zephyr-v2.0.0-656-gc6441f469d5f -DKERNEL -DNRF52840_XXAA -D_FORTIFY_SOURCE=2 -D__ZEPHYR_SUPERVISOR__ -D__ZEPHYR__=1 -I../../../../../kernel/include -I../../../../../arch/arm/include -I../../../../../include -I../../../../../include/drivers -Izephyr/include/generated -I../../../../../soc/arm/nordic_nrf/nrf52 -I../../../../../soc/arm/nordic_nrf/include -I../../../../../ext/hal/cmsis/Include -I../../../../../subsys/testsuite/include -I../../../../../subsys/testsuite/ztest/include -I/mnt/nordic/zp/modules/hal/nordic/nrfx -I/mnt/nordic/zp/modules/hal/nordic/nrfx/drivers/include -I/mnt/nordic/zp/modules/hal/nordic/nrfx/hal -I/mnt/nordic/zp/modules/hal/nordic/nrfx/mdk -I/mnt/nordic/zp/modules/hal/nordic/. -isystem ../../../../../lib/libc/minimal/include -isystem /usr/local/zephyr-sdk-0.10.3/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/8.3.0/include -isystem /usr/local/zephyr-sdk-0.10.3/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/8.3.0/include-fixed -Os -imacros/mnt/nordic/zp/zephyr/tests/drivers/adc/adc_api/build/zephyr/include/generated/autoconf.h -ffreestanding -fno-common -g -mthumb -mcpu=cortex-m4 -imacros/mnt/nordic/zp/zephyr/include/toolchain/zephyr_stdint.h -Wall -Wformat -Wformat-security -Wno-format-zero-length -Wno-main -Wno-pointer-sign -Wpointer-arith -Wno-unused-but-set-variable -Werror=implicit-int -fno-asynchronous-unwind-tables -fno-pie -fno-pic -fno-strict-overflow -fno-reorder-functions -fno-defer-pop -fmacro-prefix-map=/mnt/nordic/zp/zephyr/tests/drivers/adc/adc_api=CMAKE_SOURCE_DIR -fmacro-prefix-map=/mnt/nordic/zp/zephyr=ZEPHYR_BASE -ffunction-sections -fdata-sections -mabi=aapcs -march=armv7e-m -std=c99 -nostdinc -MD -MT zephyr/drivers/adc/CMakeFiles/drivers__adc.dir/adc_nrfx_saadc.c.obj -MF zephyr/drivers/adc/CMakeFiles/drivers__adc.dir/adc_nrfx_saadc.c.obj.d -o zephyr/drivers/adc/CMakeFiles/drivers__adc.dir/adc_nrfx_saadc.c.obj   -c /mnt/nordic/zp/zephyr/drivers/adc/adc_nrfx_saadc.c
In file included from /mnt/nordic/zp/zephyr/drivers/adc/adc_nrfx_saadc.c:8:
/mnt/nordic/zp/zephyr/drivers/adc/adc_context.h: In function 'adc_context_enable_timer':
/mnt/nordic/zp/zephyr/drivers/adc/adc_context.h:98:29: error: incompatible type for argument 2 of 'k_timer_start'
  k_timer_start(&ctx->timer, 0, interval_ms);
                             ^
In file included from ../../../../../include/kernel.h:5133,
                 from ../../../../../include/device.h:11,
                 from ../../../../../include/drivers/adc.h:16,
                 from /mnt/nordic/zp/zephyr/drivers/adc/adc_context.h:11,
                 from /mnt/nordic/zp/zephyr/drivers/adc/adc_nrfx_saadc.c:8:
zephyr/include/generated/syscalls/kernel.h:351:70: note: expected 'k_timeout_t' {aka 'struct <anonymous>'} but argument is of type 'int'
 static inline void k_timer_start(struct k_timer * timer, k_timeout_t duration, k_timeout_t period)
                                                          ~~~~~~~~~~~~^~~~~~~~
In file included from /mnt/nordic/zp/zephyr/drivers/adc/adc_nrfx_saadc.c:8:
/mnt/nordic/zp/zephyr/drivers/adc/adc_context.h:98:32: error: incompatible type for argument 3 of 'k_timer_start'
  k_timer_start(&ctx->timer, 0, interval_ms);
                                ^~~~~~~~~~~
In file included from ../../../../../include/kernel.h:5133,
                 from ../../../../../include/device.h:11,
                 from ../../../../../include/drivers/adc.h:16,
                 from /mnt/nordic/zp/zephyr/drivers/adc/adc_context.h:11,
                 from /mnt/nordic/zp/zephyr/drivers/adc/adc_nrfx_saadc.c:8:
zephyr/include/generated/syscalls/kernel.h:351:92: note: expected 'k_timeout_t' {aka 'struct <anonymous>'} but argument is of type 'u32_t' {aka 'unsigned int'}
 static inline void k_timer_start(struct k_timer * timer, k_timeout_t duration, k_timeout_t period)
                                                                                ~~~~~~~~~~~~^~~~~~
ninja: build stopped: subcommand failed.
ERROR: command exited with status 1: /usr/bin/cmake --build /mnt/nordic/zp/zephyr/tests/drivers/adc/adc_api/build

By changing the default to use legacy mode incomplete conversions like this should not break existing applications, so we can take a risk of there being some left over at the point the merge is done: they can be fixed later as they're discovered.

jhedberg commented 4 years ago

Can I get a formal-ish description of what's desired for k_sleep()? Having it take a timeout seems somewhat undesirable to me, given the way these APIs work everywhere else in the world. Are we sure we don't want to add an API instead like k_timed_delay(), or just recommend the use of a k_timer

I proposed k_delay() as an option; it got little to no support. The preferred solution is k_sleep(), taking either milliseconds (minority position) or a timeout so things like K_SECONDS(5) can still be used (majority position).

If both (k_sleep and a new "delay" API) do essentially the same thing, I think it'd needlessly complicate the Zephyr public API - we should strive to have it as small and simple as possible. I think I understand the argument that "everyone else has a sleep API taking an integer" (@nashif made the same point in the API meeting), but if we're making all our other kernel APIs take a k_timeout_t then doing the same for k_sleep would IMO just make the public Zephyr API look more consistent. Additional benefits are that the code stays more readable (you immediately see the time units with the help of the macros) and any code that was using the macros will keep working as before. I do realise we have lots of places passing hardcoded integer values to k_sleep but IMO those should be fixed regardless of this PR (hopefully with some degree of automation - there are quite many of these occurrences).

nashif commented 4 years ago

@andyross Discussion in API meeting resolved to changing this PR so the default is the legacy API. Drop the overrides in all the samples and tests; modify the tests that do support the new API to opt into it for testing.

The preponderance of opinion (but not unanimous) is to switch k_sleep() (but not k_usleep() or k_busywait()) to take a timeout. That does not have to be done here as the legacy behavior works.

This PR is being brought up in every single meeting we have in the project the last few weeks and we have different opinions depending on who attends the meetings, none of those opinions is binding and not everyone can join all of those meetings.

To avoid confusion and the too many opinions in too many "channels", this PR should be used to express any opinions. Use the review feature to +1 or -1 any changes. Lets not do review by proxy please.

pabigot commented 4 years ago

Thumbs up this comment if you want k_sleep() to take an integral parameter measuring milliseconds.

pabigot commented 4 years ago

Thumbs up this comment if you want k_sleep() to take its timeout in the same form as timeouts to k_poll, k_sem_take, and other kernel APIs.

pabigot commented 4 years ago

To avoid confusion and the too many opinions in too many "channels", this PR should be used to express any opinions. ~Use the review feature to +1 or -1 any changes.~ Lets not do review by proxy please.

Please do not use the review feature for this. Reviews should be over the as-implemented PR, not over design decisions. The design decision should precede the implementation.

There are distinct comments here. Please add a thumbs-up notation to:

nashif commented 4 years ago

Please do not use the review feature for this. Reviews should be over the as-implemented PR, not over design decisions. The design decision should precede the implementation.

And where is this design decision being discussed? What is the RFC for that? Looking at your last 2 comments, you continue to discuss same design issues in this PR, so I am lost. So if you want some design change, please open an issue for that and mark it as an RFC and have the discussion there.

Vudentz commented 4 years ago

Please do not use the review feature for this. Reviews should be over the as-implemented PR, not over design decisions. The design decision should precede the implementation.

And where is this design decision being discussed? What is the RFC for that? Looking at your last 2 comments, you continue to discuss same design issues in this PR, so I am lost. So if you want some design change, please open an issue for that and mark it as an RFC and have the discussion there.

Perhaps an issue would be better to discuss the design choices of this PR, specially regarding k_sleep and k_usleep which seems to be the most controvertial ones.

pabigot commented 4 years ago

And where is this design decision being discussed? What is the RFC for that?

Heh. That would be: https://github.com/zephyrproject-rtos/zephyr/issues/17162

It was raised in TSC 2019-07-10, and again in TSC 2019-09-11.

If I'm asking for input on design decisions here, it's because all my efforts to get it elsewhere have been ineffective, and I want to get something settled so we can move forward.

nashif commented 4 years ago

Heh. That would be: #17162

good, then you should consider taking the adhoc poll you started here to the issue above, I do not think it belongs here.

carlescufi commented 4 years ago

Heh. That would be: #17162

good, then you should consider taking the adhoc poll you started here to the issue above, I do not think it belongs here.

Might be a bit late for that since 6 people have already voted here. While I agree it would've been better to do it on the issue, I think we can consider that most people agree that a timeout is the better option now. If we need more data we can always send an email or discuss it in a meeting. I am open to both.

andyross commented 4 years ago

Regarding the argument type of k_sleep(), this exact subject came up on at least one call, way back in July, and I genuinely thought there was consensus at the time. I'll make the change, it's not hard at all. But the argument goes like this:

Using the opaque type means that (absent the legacy API support) you can't just do "k_sleep(10);", the argument is a type that needs to be initialized specially, which is a hurdle for new developers to get over that they don't have to have on other OSes. And for those who do want the "sleep for a proper timeout" support, we already provide that in the k_timer API (at the cost of two function calls instead of one: k_timer_start() + k_timer_status_sync()). The only reason k_sleep() (and k_usleep()) exists at all, really, is that it's simpler. And this isn't simpler.

But I'll make the change.

andyross commented 4 years ago

Though I suppose I could just add a k_sleep_ms() API which is a trivial wrapper to preserve those semantics and even better clarify the units (which is also confusing, as in POSIX "sleep" takes seconds, not milliseconds). That's what I think I'm going to push, as it turns out there are almost 600 uses of k_sleep() in the tree that would otherwise need to have their arguments macro-ized.

pabigot commented 4 years ago

~@andyross I really don't think that's going to fly. The community has spoken, and wants it to take a timeout.~ never mind

jfischer-no commented 4 years ago

Though I suppose I could just add a k_sleep_ms() API which is a trivial wrapper to preserve those semantics and even better clarify the units (which is also confusing, as in POSIX "sleep" takes seconds, not milliseconds)

Please do it like this! :+1:

carlescufi commented 4 years ago

@andyross I really don't think that's going to fly. The community has spoken, and wants it to take a timeout.

Well, those who looked at this PR have commented for it, but as I mentioned, if people disagree strongly we might need to take it to another forum to confirm this is what people want.

If you don't want the hassle of doing the conversion, I'm willing to step up.

What's the issue of having a k_sleep_ms() in addition to k_sleep()? I think the proposal makes sense, it gives us ease of use for beginners and flexibility for advanced users.

jhedberg commented 4 years ago

What's the issue of having a k_sleep_ms() in addition to k_sleep() I think the proposal makes sense

I'm not opposed to having it, though the difference between k_sleep(K_MSEC(10)) and k_sleep_ms(10) is quite minor, IMO. That said, if an effort will be made to convert these existing places in the tree, I suppose it should be more or less the same effort to convert them to use k_sleep and the macro?