zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.48k stars 6.41k forks source link

Hard real-time interrupt support #2543

Closed zephyrbot closed 7 years ago

zephyrbot commented 7 years ago

Reported by Carles Cufi:

Currently the ISR wrapper that envelopes all ISR execution code, and that is automatically called when using the IRQ_CONNECT() macro, adds a certain amount of latency from the moment the hardware interrupt is triggered to the first instruction executed inside the actual ISR (i.e. the one whose address is passed as a parameter to IRQ_CONNECT().

We would therefore benefit from a framework that allows interrupts that have very strict timing requirements due to external constraints (commonly referred to as hard real-time interrupts) to skip the wrapping and execute directly as soon as the hardware allows it.

A short wish list for the framework:

Please note that this is not a replacement for the Zero-Latency Interrupt (ZLI) framework already in place, but is in fact complementing it.

(Imported from Jira ZEP-1038)

zephyrbot commented 7 years ago

by Anas Nashif:

Andrew, can you please take a look at this?

zephyrbot commented 7 years ago

by Carles Cufi:

As part of this work we would like to investigate whether it's possible to reduce the amount of time that the kernel spends with interrupts disabled. I.e. is it possible to shorten the windows within irq_lock() / irq_unlock() ?

zephyrbot commented 7 years ago

by Carles Cufi:

We had a first draft of how things could look like here: https://gerrit.zephyrproject.org/r/#/c/7746/

zephyrbot commented 7 years ago

by Johan Hedberg:

All nRF51-based boards are essentially broken now without this (there at least 4 of them, if I counted right). Luckily the patch that broke them completely (i.e. not even the Bluetooth Controller-only build works) didn't make it to 1.6, however the master branch is broken. It'd be nice if fixing this could be prioritized higher than late January (preferably in the next few days/weeks).

zephyrbot commented 7 years ago

by Andrew Boie:

Carles Cufi can you create another JIRA about irq_lock/unlock time? It's not clear what you are asking for and it's probably best to track it with another JIRA ticket.

Johan Hedberg what patch in Master branch broke everything? We should probably revert that meanwhile..

zephyrbot commented 7 years ago

by Carles Cufi:

Andrew Boie : I created GH-2897 to track the irq_lock()/unlock() time. Andrew Boie : The commit that broke the nRF51 builds is this one: a100ada86639333dd0583ebf67cb9c1ff31d30e4. The reason it broke the builds is that it introduced a system clock driver (the nRF51 and nRF52 do not have a SysTick) and with it running the interrupt latencies exceed the maximum allowable by the BLE controller to run.

As discussed with Johan Hedberg and Vinayak Kariappa Chettimada , the issue with reverting that commit is that then the nRF5x ICs do not have a system clock driver, so applications relying on calls like k_sleep() do not work at all.

zephyrbot commented 7 years ago

by Andrew Boie:

OK thanks. Will try to have something to test in early January. It won't be before that unfortunately.

Question: do we need to support ISRs written in assembly, or is it sufficient to just support C ISRs declared with attribute((interrupt))? This has implications for the prologue/epilogue calls that must be in each ISR (perform EOI, PM hooks, etc).

zephyrbot commented 7 years ago

by Andrew Boie:

Johan Hedberg looping Benjamin Walsh are you sure that the issue you are seeing with the NRF timer driver isn't some issue with priorities that can be fixed? It is surprising that the introduction of a timer driver would suddenly break everything.

zephyrbot commented 7 years ago

by Carles Cufi:

Andrew Boie : From our perspective we do not require support for ISRs written in assembly, all of ours are written in C. Andrew Boie : The RTC0, which is the "ticker" for the BLE controller, runs at priority 0. The RTC1, which is the ticker for the system clock driver, runs at priority 1. Those are of course Zephyr priorities and not "real" ones. Also remember that the Cortex-M0 doesn't have a BASEPRI register and hence can only disable all interrupts in one instruction, and not those below a certain priority. Vinayak Kariappa Chettimada and myself have run extensive tests, although of course there could be something we have missed.

zephyrbot commented 7 years ago

by Andrew Boie:

Carles Cufi have you done any profiling to figure out what part of the interrupt handling code is causing your problems? Benjamin Walsh and I are looking at the interrupt code and your patch 7746. The only difference we are seeing in your custom _isr_wrapper_ll is that your code does not get out of power saving mode. The stack switch is automatically done by the CPU. You are still fetching stuff out of the sw_isr_table. It seems to us that the bottleneck here is the PM code which you deleted. Devices may not be woken up, there may be timekeeping issues wrt tickless idle, and we are wondering how your code can function. The problem here is that in the implementation I am working on, to ensure correctness the power saving code will still be executed before the ISR runs. For example in current implementaion of systick handler, we still call power management. See drivers/timer/cortex_m_systick.c We could perhaps delay the PM stuff until after the ISR and not before, but then this is no longer a generic realtime interrupt feature but something specific to ISRs that do not care about timekeeping. Let me know your thoughts, maybe we should schedule a call. It really seems like what is messing you up is the power saving.

zephyrbot commented 7 years ago

by Vinayak Kariappa Chettimada:

Andrew Boie as far as problem cause path, its the irq_lock/unlock use which is O(n) in (atleast) the system clock announce. Regarding, the interrupt handling the power management potentially can be O(n) too based on its design to enumerate through device drivers if any.

Use of sw_isr_table has deterministic latency, which is acceptable. No need to change that design.

I am thinking that PM stuff should be runtime conditional, say sw_isr_table has a flag to select PM stuff. This flag can be selected in IRQ_CONNECT and ISR's can call PM API's before PM use by the waking-up IRQ.

BLE controller has its own (for now) a soft real-time scheduler and it is race-to-idle which means for kernel's sake the CPU has not woken up (idle task gets interrupted but should go back to sleep). Only when BLE controller has anything sensible for application, it will use PM stuff to wake up the kernel. This way CPU utilizations and hence power consumption (CPU is almost the highest power consumer) will be low.

zephyrbot commented 7 years ago

by Carles Cufi:

Andrew Boie , Benjamin Walsh : As Vinayak Kariappa Chettimada mentioned, although the prologue in the ISR wrapper that handles power management does delay the execution of the actual radio ISR, our main problem is the irq_lock/unlock described in GH-2897 that is the most problematic today. This is proven by the fact that the issues started to appear when we added a system clock driver, and we had been running fine with the isr wrapper in the radio interrupt until then. That said, everything adds up: there is a hard limit in microseconds that we need to respect between the moment the radio ISR triggers until the moment we reprogram the radio registers for the next transaction, and both factors (irq lock/unlock in announce() and the ISR wrapper overhead) contribute to us not meeting the deadline. To which extend each contributes we do not know exactly. When the issue first appeared after introducing drivers/timer/nrf_rtc_timer.c we started looking for reasons and came up with the ISR wrapper prologue and later the irq lock/unlock in announce(), although only the latter was actually new, the prologue had been always executing before without trouble. You wonder how things can function without the power management code in the ISR wrapper and the answer is that although things worked, we were only using the radio and the Bluetooth stack in our tests, so potentially there might have been other functionality broken that we did not detect. I've seen the power management code in cortex_m_systick.c and we don't have it at all in nrf_rtc_timer.c. That's another question to raise, whether we need to have that at all. The thing with the nRF5x series is that peripherals all go to sleep automatically when not used, so I'm not quite sure we need that power management code in the nRF5x case. Regarding this Jira issue in particular, as Vinayak says, the PM prologue could be optional and selectable per ISR, as our BLE controller runs in the background with no interaction with the kernel except for a k_sem_give() in certain instances. We'd be happy to schedule a call, we are on the Central European Timezone. You can find us on IRC as vich and carlesc.

zephyrbot commented 7 years ago

by Tomasz Bursztyka:

As a side note, hard real time support will be more and more needed as we move towards implementing time-constrained subsystems actually. In net we have a few showing up in mit-term horizon.

zephyrbot commented 7 years ago

by Carles Cufi:

Tomasz Bursztyka In fact I can already think of Thread + BLE running both concurrently on the new nRF52840 with the IPv6 stack bridging both interfaces.

zephyrbot commented 7 years ago

by Tomasz Bursztyka:

Carles Cufi Ah, so nRF52840 is sharing it's radio-core and antenna for 15.4 and BLE? I was wondering about it just an hour ago when Johan Hedberg told me about this chip.

zephyrbot commented 7 years ago

by Carles Cufi:

Tomasz Bursztyka yes, that's right. The scheduler inside the current BLE controller already supports multiple clients. A new client, the 15.4 MAC will need to run alongside the BLE controller and both of them will have the hard real-time requirements we are discussing here.

zephyrbot commented 7 years ago

by Carles Cufi:

Andrew Boie , Benjamin Walsh : Any news on this? Would you like to schedule a call?

zephyrbot commented 7 years ago

by Andrew Boie:

If you want to schedule a call I am available next week. Ben and I work in US TZ (Pacific and Eastern respectively). We are talking about at least 3 different things in this ticket and really need to discuss in a call.

zephyrbot commented 7 years ago

by Carles Cufi:

We're available Monday and Tuesday, do any of the 2 days suit you?

zephyrbot commented 7 years ago

by Chuck Jordan:

It should also be pointed out that if the ISR only uses certain registers, and is written custom and in assembly language, it need only save/restore the registers it uses. It therefore might be on the application developer to write a customer ISR and to take great care which registers are used. It might be BETTER if no RTOS services are callable from such an ISR, because calling C places a burdon on you to have to save a lot more.

zephyrbot commented 7 years ago

by Andrew Boie:

Chuck Jordan whether the ISR itself is written in C or assembly really doesn't have any implications on the design. If they call into C from their ISR, they have to respect the calling convention. If they do it in C with attribute((interrupt)) the compiler will ensure the right thing is done with respect to registers.

zephyrbot commented 7 years ago

by Chuck Jordan:

Asm handlers wouldn't need to respect the C-ABI. They can save/restore ONLY what is needed. C handlers, on the other hand, would need to save all scratch registers, and any callee-saved that need to be spilled. There is also the design choice of whether all interrupts should use interrupt stacks -- implying sp must be switched too if this is wanted.

Meanwhile, another idea here is that SOME interrupt handlers might NOT need to do the POST-PROCESSING for swap decision. So called non-preempting interrupt category. An example might be an interrupt that does data-movement for you but never needs to awaken a thread. So in addition to low-latency upon ENTRY to a handler, it is sometimes useful to have low-latency at the return from interrupt too (as an optional feature). Swap decision and other RTOS bookkeeping might be avoided for some special interrupts.

zephyrbot commented 7 years ago

by Jon Medhurst:

Agree that it should be possible to avoid RTOS book-keeping. This is currently possible on ARM CPUs if the hardware exception vectors points directly to the ISR and that ISR manually calls _IntExit only if required. E.g. the example below is for stuffing a hardware FIFO with data and signalling a task when all data has been transferred. If there was a special version of k_sem_give that also jumped to _IntExit then the ISRs like this that just do processing then optionally signal completion via a semaphore wouldn't even need to preserve the link register on the stack, and the resultant code would be pretty much optimal from a latency and throughput point of view.

struct my_hardware_regs {
    volatile uint32_t   fifo_data;
    volatile uint32_t   fifo_full_flag;
} driver0_regs;

struct my_driver_data {
    struct my_hardware_regs *regs;
    const uint8_t       *buf_ptr;
    const uint8_t       *buf_end;
    struct k_sem        tx_done_sem;
} driver0;

void driver0_isr(void)
{
    struct my_driver_data *drv = &driver0;
    struct my_hardware_regs *regs = drv->regs;
    const uint8_t *ptr = drv->buf_ptr;

    while (!regs->fifo_full_flag) {
        regs->fifo_data = *ptr++;
        if (ptr < drv->buf_end)
            continue;
        k_sem_give(&drv->tx_done_sem);
        _IntExit();
        return;
    }

    drv->buf_ptr = ptr;
}
zephyrbot commented 7 years ago

by Andrew Boie:

Asm handlers wouldn't need to respect the C-ABI. They can save/restore ONLY what is needed.

They do if the ISR needs to call into other C functions, which is what I was saying. If you write an ISR in C, declare it with attribute(interrupt), and don't call into any other C functions, I expect the compiler will also only push the registers that it uses and nothing else. (going to verify this thought)

There is also the design choice of whether all interrupts should use interrupt stacks

We are not going to switch stacks for these ISRs.

Meanwhile, another idea here is that SOME interrupt handlers might NOT need to do the POST-PROCESSING for swap decision.

Skipping the scheduler at the end is already part of the plan.

zephyrbot commented 7 years ago

by Chuck Jordan:

Another consideration in the design is the notion of whether or not interrupt-entry should be a common-handler or could be different variants of common handlers based on some criterion. For ARC CPUs, for example, we have two variants that are common, RIRQ and FIRQ. The FIRQ handler doesn't need to do save/restore since it operates from its own separate register bank and has its own separate stack pointer -- switched by hw as interrupt is taken. The "criterion" for FIRQ is that the priority is zero. Other priorities are RIRQ. priority can be changed at runtime. So you could imagine that as a "favor" for when the priority is changed, the vector could be altered to point to the "common handler" for FIRQ or RIRQ, depending upon the priority. This allows for the entry code to not have to consider priority (and thus be less code, less cycles).

As a generalization, the notion of changing the vector for a different common-handler depending upon when something ELSE changes, might be a useful thing. Another example might be a CPU that supports NMI. When handler is NMI, its entry is different from when the handler is not-NMI. Non-maskable interrupt handlers don't need to worry about nested interrupts (since they are highest).

zephyrbot commented 7 years ago

by Benjamin Walsh:

There is also the design choice of whether all interrupts should use interrupt stacks We are not going to switch stacks for these ISRs.

Don't forget that ARM Cortex-M processors can be programmed to switch stacks automatically, and that is a global setting. So if the regular interrupts switch stacks, so will these realtime interrupts.

zephyrbot commented 7 years ago

by Andrew Boie:

just to get people up to speed, the plan we talked about on the call is to define macros that people drop into their realtime ISR to accomplish stuff since we won't be going through common code any more.

example IRQ_HEADER() - stuff that must be done before the body of the ISR, could expand to nothing. IRQ_FOOTER() - stuff that must be done at the end of the ISR. EOI for example. IRQ_PM_HOOK() - do power management. place either at the beginning or the end as latency requires, or omit entirely if unnecessary

example


void my_isr __attribute__((interrupt)) {
    IRQ_HEADER();

    /* ... do things .... */

    IRQ_PM_HOOK(): /* do PM after we service the interrupt */
    IRQ_FOOTER();
}

the plan was to omit the stuff that calls _Swap(), but if we really did want to invoke the scheduler at the end of one of these ISRs we could define IRQ_SWAP() that does it that could be used optionally

So if the regular interrupts switch stacks, so will these realtime interrupts.

Understood. I was thinking about arches that do this manually, and omitting it to save a few cycles. Is this something we need to worry about?

Another consideration in the design is the notion of whether or not interrupt-entry should be a common-handler or could be different variants of common handlers based on some criterion

These ISRs were going to be put directly in the vector table and not go through any common code first, aside from the macros I stated above. Do you have some notion on how this would look, given what I've stated above?

Non-maskable interrupt handlers don't need to worry about nested interrupts (since they are highest).

Whether to re-enable IRQs when servicing the interrupt to allow for nested interrupts would be up to the ISR's implementation I think.

zephyrbot commented 7 years ago

by Benjamin Walsh:

So if the regular interrupts switch stacks, so will these realtime interrupts.

Understood. I was thinking about arches that do this manually, and omitting it to save a few cycles. Is this something we need to worry about?

The difference will be that all stacks on arches that do not automatically switch stacks have to be able to handle the realtime ISRs, and not on the arches that do automatic stack switching. That's the only thing I can think about.

zephyrbot commented 7 years ago

by Chuck Jordan:

Understood. I was thinking about arches that do this manually, and omitting it to save a few cycles. Is this something we need to worry about?

W/o interrupt stack, each thread might need a slightly bigger stack size to account for the interrupt overhead that might appear on the its personal stack. If there are N threads, this is N*overhead more bytes taken in the system.

zephyrbot commented 7 years ago

by Chuck Jordan:

Whether to re-enable IRQs when servicing the interrupt to allow for nested interrupts would be up to the ISR's implementation I think.

Some targets may have more than one common handler need. The macros you come up with might be used in different ways in these handlers. IRQ_HEADER_FAST vs IRQ_HEADER_SLOW, etc.

zephyrbot commented 7 years ago

by Andrew Boie:

Chuck Jordan I am becoming very concerned that the number of macros required to satisfy your requirements is becoming very large. So far I have:

This is getting really complex. The real issue that was causing Carles Cufi problems was that the power management hook was taking too long to execute before the ISR ran. I feel that we should cover the common cases and if someone wants to do something really weird they can do it themselves in assembly. That would mean a reduced set of features for these so-called 'direct' IRQs if written in C:

There will still be the capability to put any ISR you like in the vector table so if this doesn't cover some corner case, it's always possible to do exactly what's needed in ASM. If this is insufficient please feel free to counter with some use-cases that we think people will run into often.

Carles Cufi , Tomasz Bursztyka , Benjamin Walsh feel free to weigh in as well.

zephyrbot commented 7 years ago

by Chuck Jordan:

Thinking ahead to multi-core, so called IPI (inter-processor) interrupts may have certain requirements different from an I/O interrupt. IPI might be used for message passing, in an efficient way as possible where receiver of the IPI doesn't have to do full save/restore but does need to awaken some thread as efficiently as possible so that the thread can come alive and consume the message. One way to do this is to write the semaphore POST routine in assembly too, and account for all registers to avoid excess save/restore. Low-end IoT probably needs this "faster" way of doing IPI when its multi-core.

zephyrbot commented 7 years ago

by Andrew Boie:

Chuck Jordan is what I have proposed sufficient for the initial implementation and we can open additional user stories later for the extra features you are describing? Need to control the scope of this task.

zephyrbot commented 7 years ago

by Chuck Jordan:

yes, by all means proceed. Sorry about the noise, just passing along some thoughts for consideration.

zephyrbot commented 7 years ago

by Carles Cufi:

Andrew Boie : The feature set mentioned in your earlier comment more than addresses our original requirements. Anything else is a bonus, but we'd rather have a simpler version earlier than a complex one later.

zephyrbot commented 7 years ago

by Andrew Boie:

Chuck Jordan these are good ideas :) I am going to open some additional user stories for them, just wanted to make sure we didn't need them for the first iteration.

About slow/fast header/footers: It seems to me that this type of interrupt would be used only for those ISRs which must run with the lowest latency. So on arches that have normal and "fast" interrupts, my expectation is that people would only use these Direct IRQs with "fast" interrupts. When I get around to the ARC port, this initial implementation will be built assuming that all direct interrupts are at FIRQ priority, and the header/footer macros will be implemented assuming FIRQ context. Is that a reasonable assumption for common usage?

zephyrbot commented 7 years ago

by Chuck Jordan:

See GH-2230. ARC doesn't yet supported nested interrupts with RIRQ. You can only nest a FIRQ on a RIRQ @ this time. Perhaps you should implement your change first, and not worry about nested RIRQ. I can take a look and try to resolve support for that. It affects em_starterkit but no Arduino 101 since Arduino 101 has only 2 priorities. Yeah, I don't think IPI interrupts are need yet -- future projects.

zephyrbot commented 7 years ago

by Andrew Boie:

Here's the interface specification: https://gerrit.zephyrproject.org/r/#/c/10183/

zephyrbot commented 7 years ago

by Andrew Boie:

Carles Cufi had a really good idea in code review. Instead of forcing all these ISRs to be declared with attribute((interrupt)), include a bunch of boilerplate macros, and not take parameters, why not instead implement an alternate version of _isr_wrapper() to handle these direct interrupts.

For example

__attribute__((interrupt)) _isr_direct_wrapper(void)
{
IRQ_DIRECT_HEADER();

check_reschedule = (*irq_direct_table[irq])();

IRQ_DIRECT_FOOTER(check_reschedule);
}

irq_direct_table wouldn't even need to be a separate table, we could just re-use sw_isr_table for it! In addition, this wrapper function could just be implemented in assembly, no need to mess around with attribute((interrupt)) any more. I'm going to prototype this approach, I think it is simpler to deal with, and I don't think the function pointer indirection will cost us enough performance to care about it.

zephyrbot commented 7 years ago

by Andrew Boie:

Carles Cufi Jon Medhurst Chuck Jordan Thinking about this some more:

It's hard to come up with a one-size-fits all solution that makes everyone happy. What if we did the following:

1) Add a flag to the existing IRQ_CONNECT() code which will indicate to the system that interrupt should not automatically call power management. The ISR can either do it itself or omit it entirely. This should satisfy Carles Cufi 's need I think. 2) Add a cross-platform interface to install ISRs directly in the vector table. This will cover Jon Medhurst 's scenario, you can put anything you like in there.

Chuck Jordan brought up a feature to skip post-processing (i.e. the _Swap() call at the end of the interrupt): {quote} Meanwhile, another idea here is that SOME interrupt handlers might NOT need to do the POST-PROCESSING for swap decision. So called non-preempting interrupt category. An example might be an interrupt that does data-movement for you but never needs to awaken a thread. So in addition to low-latency upon ENTRY to a handler, it is sometimes useful to have low-latency at the return from interrupt too (as an optional feature). Swap decision and other RTOS bookkeeping might be avoided for some special interrupts. {quote} We might not need to do anything for this. If an interrupt does data movement but doesn't do anything which will pend another thread, the check of the ready queue cache in the existing code will not cause _Swap() to be invoked. So this scenario should already be pretty fast.

zephyrbot commented 7 years ago

by Chuck Jordan:

the check of the ready queue cache in the existing code will not cause For those systems with d-cache, just reading a single word for an if-statement can cause a rather large STALL if there is a miss. So any avoidance of memory ops reduces latency. Also, larger ISR code affects I-cache miss ratios too.

zephyrbot commented 7 years ago

by Andrew Boie:

Alright. I've scheduled a call for tomorrow for additional discussion. For now I'm inclined to stick to the current interface plan so that the user has a lot of control over what the ISR will do. It doesn't seem that adding a runtime check to the existing interrupt code to skip PM will work out very well, and we need more direct control on whether to _Swap(). The current interface I posted will let you skip _Swap() with a build time constant, so no runtime checks at all.

zephyrbot commented 7 years ago

by Chuck Jordan:

for ARC: Our compiler guy replied with this information: "Regarding automatic register saving when entering an IRQ, gcc has two options:

zephyrbot commented 7 years ago

by Andrew Boie:

Thanks Chuck Jordan ! I opened GH-3074 for further investigation.

zephyrbot commented 7 years ago

by Ramesh Thomas:

Reduction of the amount of time the kernel spends with interrupts disabled. This is critical for hard real-time interrupts and currently it is O ( n ) with threads/timeouts: tracked in GH-2897 This means we would need to reduce post-processing time also besides servicing the interrupt in real-time. We do the following in post processing

  1. _sys_soc_resume - PM notification of wake from idle
  2. _timer_idle_exit() - adjusts timer, handles timeouts and time slicing. (GH-1719 does not help here because interrupts are disabled)
  3. Swap - switch thread if current one has been preempted.

Does the ISR have enough information to make the decision to skip any of these? Following knowledge and assumptions would be necessary for the decision to skip each of them. #1 Know if the interrupt will never happen while idling. Or know that nothing important is done in _sys_soc_resume and can be skipped. #2 Assume there would be other events to take care of timeouts and time slicing considering we go tickless in future. Or that there will be no timeouts or time slicing. #3 Should not swap if handling a nested interrupt because of preempted interrupts using interrupt stack. If it is idle thread, then know whether it will need to exit due to timeout expiry. Or, like #2, assume there will be no timeouts or time slicing and everything happens in ISRs.

zephyrbot commented 7 years ago

by Mark Linkmeyer:

Per email update from Andrew Boie on Monday (2/6/17), here's the status at that time: "The core interface is merged, as well as the x86 implementation. But the ARM port needs a couple days more work on my part before it will be ready for code review."

zephyrbot commented 7 years ago

by Andrew Boie:

Carles Cufi this is now merged in the tree, can you tell me if this is working for you so we can close?

zephyrbot commented 7 years ago

by Carles Cufi:

Andrew Boie yes, I've started today will try to continue tomorrow

zephyrbot commented 7 years ago

by Carles Cufi:

Andrew Boie : Tested today on current Bluetooth branch with good results. Haven't been able to measure the latency but since functionally there seem to be no regressions after the transition to direct interrupts, I consider this done. Also, thanks for all the work that went into including this feature.

zephyrbot commented 7 years ago

by Andrew Boie:

Fixed based on feedback from reporter. Please feel free to open bugs & assign to me if you run into any problems.

Separate JIRA opened for enabling this on ARC: GH-3180