zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.14k stars 6.23k forks source link

legacy/kernel/test_early_sleep/ fails on EMSK #2821

Closed zephyrbot closed 7 years ago

zephyrbot commented 7 years ago

Reported by Inaky Perez-Gonzalez:

Random failures, however, I was able to catch

$ git checkout 850877b95dc81d3578ba0c37c2007a1b82e96474

***********************************
Exception vector: 0x00000002, cause code: 0x00000000, parameter 0x00000000
Address 0x80001570
Fatal fault in essential thread! Spinning...

Also

$ git checkout 796a6bb4d88db3e4f1e5a047dcd59b61a43297d4

Firmware   Jan 12 2016, v2.2

Bootloader Dec 29 2015, v1.1

Exception vector: 0x00000002, cause code: 0x00000000, parameter 0x00000000
Address 0x80001570
Fatal fault in essential thread! Spinning...

And

$ git checkout 0cddc4b6653e9aef956e39ab169c241287ce0dc1

***********************************
Exception vector: 0x00000002, cause code: 0x00000000, parameter 0x00000000
Address 0x80001570
Fatal fault in essential thread! Spinning...

(Imported from Jira ZEP-1342)

zephyrbot commented 7 years ago

by Chuck Jordan:

I was able to reproduce a memErr exception at startup, but not all the time.

One thing I see changed was reset.S. It is making a bunch of function calls now to memset stack areas .... BUT, the code wasn't suppose to do ANYTHING other than set up initial SP, because I-cache is LIVE and might have older CRUFT in it. So I think what can happen is you can run one program -- leaving CRUFT in I-cache. Then run another program, and I-cache hits cause you to fetch wrong code. The code in _PrepC turns off I-cache IMMEDIATELY. It is to remain off until _icache_setup(). But now with this change in restart.S, cache is live and has garbage in it.

zephyrbot commented 7 years ago

by Chuck Jordan:

Well I should clarify that a new program running doesn't perform a RESET ... so the I-cache is ON from the previous run.

zephyrbot commented 7 years ago

by Chuck Jordan:

Also, when building for EM7D, which doesn't have _firq_stack, now get an undefined due to change in reset.S.

zephyrbot commented 7 years ago

by Inaky Perez-Gonzalez:

My setup powers off the whole thing in between test cases--it power cycles the board, then loads the firmware, issues a reset halt and then a resume. So the i-cache should be wiped? Is there a way I can confirm that, for example, looking at openocd?

Your results confirm that what I see is random too. Good :)

zephyrbot commented 7 years ago

by Benjamin Walsh:

Chuck Jordan W.R.T. EM7D: fast_irq.S is always compiled and the _firq_stack symbol is always created. I verified that when I set CONFIG_FIRQ_STACK_SIZE and/or CONFIG_ISR_STACK_SIZE to 0, the symbols were created and the stack init code in reset.S gets compiled correctly and memset gets called with a size of 0. I built for arduino_101_sss though. How do I build for EM7D ?

zephyrbot commented 7 years ago

by Sharron LIU:

Ramesh Thomas , would let you be aware of the discussion above. For power state transition across sleep/wakeup @ARC (where reset routine is executed), we assume SRAM and I-cache are both preserved, confirm? Thanks.

zephyrbot commented 7 years ago

by Chuck Jordan:

to build for EM7D, set CONFIG_SOC_EM7D=y, or select it with "make menuconfig". EM7D doesn't have 2 banks for FIRQ, and so _firq_stack isn't needed. Notice in firq_irq.S that _firq_stack appears only if CONFIG_RGF_NUM_BANKS != 1.

zephyrbot commented 7 years ago

by Chuck Jordan:

If it is an ARC w/o any power management facility, yes a sleep instruction will not alter caches or system memory, be it sram, dram, etc. But wen power management circuitry is present, there can be different power domains -- depending upon the implementation. Some power domains will power off caches I think. So it depends on the power-down mode in that case.

zephyrbot commented 7 years ago

by Benjamin Walsh:

Chuck Jordan EM7D/fast_irq: oh right, I completely missed that...

zephyrbot commented 7 years ago

by Ramesh Thomas:

Sharron LIU If you are referring to deep sleep, we only assume RAM is retained. We invalidate i-cache before entering deep sleep. I would assume i-cache would be lost when the core is powered off in deep sleep.

To avoid any confusion, I am referring to quark_se specific implementation that has not been merged yet.

zephyrbot commented 7 years ago

by Benjamin Walsh:

Chuck Jordan OK, so should we just move the calls to disable_icache and invalidate_dcache to __reset, before initializing the interrupt stack ?

zephyrbot commented 7 years ago

by Chuck Jordan:

I'm thinking we keep __reset real simple with its jump, and PUT a comment there saying not to put any addition code here. The stack pattern painting could be moved to prepC() after I-cache is turned off. Make sense?

zephyrbot commented 7 years ago

by Benjamin Walsh:

Chuck Jordan The problem with doing that is the stack that _PrepC runs on now is the same stack that we want to paint. :-)

Unless I got it wrong, I think this is the equivalent of disable_icache and invalidate_dcache in asm:

{code} mov r1, 1

invalidate_and_disable_icache:

    lr r0, [_ARC_V2_I_CACHE_BUILD]
    and.f r0, r0, 0xff
    bz.nd invalidate_dcache

    mov_s r2, 0
    sr r2, [_ARC_V2_IC_IVIC]
    nop
    sr r1, [_ARC_V2_IC_CTRL]

invalidate_dcache:

    lr r3, [_ARC_V2_D_CACHE_BUILD]
    and.f r3, r3, 0xff
    bz.nd done_cache_invalidate

    sr r1, [_ARC_V2_DC_IVDC]

done_cache_invalidate: {code}

Not too complicated.

zephyrbot commented 7 years ago

by Chuck Jordan:

See disable_icache(). We just want that to happen BEFORE we start executing lots of code that might be from the I-cache. We just want it completely OFF. It gets turned on later. The check for the __ARC_V2_I_CACHE_BUILD first is needed because target may not have I-cache.

zephyrbot commented 7 years ago

by Benjamin Walsh:

Chuck Jordan Yes, I understand. But what I mean is that we can stick the little snippet of asm in my previous comment right after locking interrupts in __reset(), and remove disable_icache() and invalidate_dcache() from prep_c.c. I don't think having this little bit of asm is worse to maintain than the two C functions.

zephyrbot commented 7 years ago

by Benjamin Walsh:

Chuck Jordan I think there is a bug in disable_icache(). From the manual I have: "All SR accesses to the IC_IVIC register must be followed by three NOP instructions". disable_icache() only has one.

And there was a bug in my snippet, where I wrote 1 to IC_IVIC instead of to IC_CTRL. Corrected.

zephyrbot commented 7 years ago

by Chuck Jordan:

Well, the caveat here is that if the I-cache is being disabled, probably the settle time may not matter. But to be conservative, sure, NOPS won't hurt.

zephyrbot commented 7 years ago

by Chuck Jordan:

WORKS for me when I build with CONFIG_SOC_EM9D=y, and have only BIT 1 dip switch down. However, if I were to run this again with all dip switches UP so that EM7D is select, YES I see a memerr exception vector 2. So, is it possible the dip switches are in the wrong position? When you build, see the output and check to see which SOC is being built. You will see something like: ... CC arch/arc/soc/em9d/soc.o ... That is the EM9D configuration, and thus dip-switch 1 should be down. This is the Harvard architecture. It has memories in a different location. The other two SOCs (em7d and em11d) utilize DRAM, which lives at a different address. So if the wrong executable was run on the wrong SOC, a memerr vector 2 exception would result -- or some other exception.

zephyrbot commented 7 years ago

by Inaky Perez-Gonzalez:

Confirming that this is happening with DIP switch 1 down and CONFIG_SOC_EM9D=y (as it is the deftault config)

I reaffirm: these tests (and others I report off EMSK) happen randomly; we run the whole test suite on them and randomly one or another fails, sometimes not being able to dump the registers. The HW is not touched or altered in between runs (other than power cycling).

All is set to DIP switch 1 and the defconfig has EM9D selected. If this was not the case, all the TCs would be failing.

zephyrbot commented 7 years ago

by Chuck Jordan:

ok. EM9D doesn't have caches, so there might be ANOTHER problem here. I haven't seen it yet, but I'll keep looking.

zephyrbot commented 7 years ago

by Inaky Perez-Gonzalez:

Chuck Jordan , this might be the same cause (board failing to detect switches). We are moving all our HW to EM7D to work that around. So please hold until I report all clear in a few runs of the test cases.

zephyrbot commented 7 years ago

by Chuck Jordan:

will hold. master branch has been changed to have em7d as default. But for 1.6, you will still need to set this during make I think and less we bring that change back too.

zephyrbot commented 7 years ago

by Inaky Perez-Gonzalez:

Chuck, can you confirm the SRAM load address for 7d is 0x10000000?

I moved the HW to all DIP switches up for 7D but now the flashing is not working, so I am trying to sort out what is being done wrong on my side. THe board boots and says:

...
Firmware   Jan 12 2016, v2.2
Bootloader Dec 29 2015, v1.1
ARC EM7D,  core configuration <span>#</span>1 
ARC IDENTITY = 0x42

RF_BUILD = 0x2
TIMER_BUILD = 0x10304
ICCM_BUILD = 0xa04
DCCM_BUILD = 0x10904
I_CACHE_BUILD = 0x225104
D_CACHE_BUILD = 0x215104
SelfTest PASSED
Info: No boot image found

so I know that part is working ok :)

zephyrbot commented 7 years ago

by Inaky Perez-Gonzalez:

Chances are also that the serial port configuration is different? Here are my steps:

$ tcf debug-openocd emsk-25 "reset halt"

JTAG tap: arc-em.cpu tap/device found: 0x200044b1 (mfg: 0x258, part: 0x0004, ver: 0x2)
target state: halted

$ tcf debug-openocd emsk-25 "targets"
    TargetName         Type       Endian TapName            State       
--  ------------------ ---------- ------ ------------------ ------------
 0* arc-em.cpu         arcv2      little arc-em.cpu         halted

$ tcf debug-openocd emsk-25 "load_image test_commonoutdir-mziz-em_starterkitzephyr.elf-894583d5c1"
20392 bytes written at address 0x10000000
downloaded 20392 bytes in 0.047550s (418.803 KiB/s)

$ tcf debug-openocd emsk-25 "resume"

$ tcf console-read emsk-25 

***********************************
**       Synopsys, Inc.          **
**     ARC EM Starter kit        **
**                               **
** Comprehensive software stacks **
**   available from embARC.org   **
**                               **
***********************************
Firmware   Jan 12 2016, v2.2
Bootloader Dec 29 2015, v1.1
ARC EM7D,  core configuration <span>#</span>1 

ARC IDENTITY = 0x42
RF_BUILD = 0x2
TIMER_BUILD = 0x10304
ICCM_BUILD = 0xa04
DCCM_BUILD = 0x10904
I_CACHE_BUILD = 0x225104
D_CACHE_BUILD = 0x215104

SelfTest PASSED

Info: No boot image found

[ this is flashing kernel/test_common, btw ]

so I am puzzled -- I am using the default config, not touching anything once the make target runs and creates it (the very same steps I used with 9d):

CONFIG_ARC=y
CONFIG_ARCH="arc"
CONFIG_SOC="em7d"
CONFIG_BOARD="em_starterkit"
CONFIG_SOC_EM7D=y
CONFIG_ARCH_DEFCONFIG="arch/arc/defconfig"
CONFIG_CPU_ARCEM4=y
CONFIG_CPU_ARCV2=y
CONFIG_DATA_ENDIANNESS_LITTLE=y
CONFIG_NUM_IRQ_PRIO_LEVELS=2
CONFIG_NUM_IRQS=36
CONFIG_RGF_NUM_BANKS=1
CONFIG_FIRQ_STACK_SIZE=1024
CONFIG_FAULT_DUMP=2
CONFIG_ICCM_SIZE=256
CONFIG_ICCM_BASE_ADDRESS=0x00000000
CONFIG_DCCM_SIZE=128
CONFIG_DCCM_BASE_ADDRESS=0x80000000
CONFIG_SRAM_SIZE=131072
CONFIG_SRAM_BASE_ADDRESS=0x10000000
CONFIG_FLASH_SIZE=0
CONFIG_FLASH_BASE_ADDRESS=0x00000000
CONFIG_SW_ISR_TABLE=y
CONFIG_IRQ_VECTOR_TABLE_BSP=y
CONFIG_CACHE_LINE_SIZE=32
CONFIG_CACHE_FLUSHING=y
CONFIG_BOARD_EM_STARTERKIT=y
CONFIG_MULTITHREADING=y
CONFIG_NUM_COOP_PRIORITIES=16
CONFIG_NUM_PREEMPT_PRIORITIES=15
CONFIG_MAIN_THREAD_PRIORITY=0
CONFIG_COOP_ENABLED=y
CONFIG_PREEMPT_ENABLED=y
CONFIG_PRIORITY_CEILING=0
CONFIG_MAIN_STACK_SIZE=1024
CONFIG_IDLE_STACK_SIZE=320
CONFIG_ISR_STACK_SIZE=2048
CONFIG_NUM_DYNAMIC_TIMERS=0
CONFIG_TICKLESS_IDLE_SUPPORTED=y
CONFIG_ERRNO=y
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=1024
CONFIG_SYSTEM_WORKQUEUE_PRIORITY=-1
CONFIG_OFFLOAD_WORKQUEUE_STACK_SIZE=1024
CONFIG_OFFLOAD_WORKQUEUE_PRIORITY=-1
CONFIG_ATOMIC_OPERATIONS_C=y
CONFIG_TIMESLICING=y
CONFIG_TIMESLICE_SIZE=0
CONFIG_TIMESLICE_PRIORITY=0
CONFIG_SEMAPHORE_GROUPS=y
CONFIG_NUM_MBOX_ASYNC_MSGS=10
CONFIG_NUM_PIPE_ASYNC_MSGS=10
CONFIG_MEM_POOL_SPLIT_BEFORE_DEFRAG=y
CONFIG_HEAP_MEM_POOL_SIZE=0
CONFIG_SYS_CLOCK_TICKS_PER_SEC=1000
CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC=30000000
CONFIG_SYS_CLOCK_EXISTS=y
CONFIG_RING_BUFFER=y
CONFIG_KERNEL_INIT_PRIORITY_OBJECTS=30
CONFIG_KERNEL_INIT_PRIORITY_DEFAULT=40
CONFIG_KERNEL_INIT_PRIORITY_DEVICE=50
CONFIG_APPLICATION_INIT_PRIORITY=90
CONFIG_MDEF=y
CONFIG_NANO_TIMEOUTS=y
CONFIG_NANO_TIMERS=y
CONFIG_CONSOLE=y
CONFIG_CONSOLE_HAS_DRIVER=y
CONFIG_UART_CONSOLE=y
CONFIG_UART_CONSOLE_ON_DEV_NAME="UART_1"
CONFIG_UART_CONSOLE_INIT_PRIORITY=60
CONFIG_ETH_INIT_PRIORITY=80
CONFIG_SERIAL=y
CONFIG_SERIAL_HAS_DRIVER=y
CONFIG_UART_INTERRUPT_DRIVEN=y
CONFIG_UART_NS16550=y
CONFIG_UART_NS16550_PORT_1=y
CONFIG_UART_NS16550_PORT_1_NAME="UART_1"
CONFIG_UART_NS16550_PORT_1_IRQ_PRI=1
CONFIG_UART_NS16550_PORT_1_BAUD_RATE=115200
CONFIG_UART_NS16550_PORT_1_OPTIONS=0
CONFIG_ARCV2_INTERRUPT_UNIT=y
CONFIG_ARCV2_TIMER=y
CONFIG_ARCV2_TIMER_IRQ_PRIORITY=0
CONFIG_SYSTEM_CLOCK_INIT_PRIORITY=0
CONFIG_RANDOM_GENERATOR=y
CONFIG_TEST_RANDOM_GENERATOR=y
CONFIG_TIMER_RANDOM_GENERATOR=y
CONFIG_GPIO=y
CONFIG_SYS_LOG_GPIO_LEVEL=0
CONFIG_GPIO_DW=y
CONFIG_GPIO_DW_INIT_PRIORITY=60
CONFIG_GPIO_DW_0=y
CONFIG_GPIO_DW_0_NAME="GPIO_PORTA"
CONFIG_GPIO_DW_0_IRQ_DIRECT=y
CONFIG_GPIO_DW_0_IRQ_PRI=1
CONFIG_GPIO_DW_1=y
CONFIG_GPIO_DW_1_NAME="GPIO_PORTB"
CONFIG_GPIO_DW_1_IRQ_DIRECT=y
CONFIG_GPIO_DW_1_IRQ_PRI=1
CONFIG_GPIO_DW_2=y
CONFIG_GPIO_DW_2_NAME="GPIO_PORTC"
CONFIG_GPIO_DW_2_IRQ_DIRECT=y
CONFIG_GPIO_DW_2_IRQ_PRI=1
CONFIG_GPIO_DW_3=y
CONFIG_GPIO_DW_3_NAME="GPIO_PORTD"
CONFIG_GPIO_DW_3_IRQ_DIRECT=y
CONFIG_GPIO_DW_3_IRQ_PRI=1
CONFIG_TEXT_SECTION_OFFSET=0
CONFIG_CROSS_COMPILE=""
CONFIG_COMPILER_OPT=""
CONFIG_TOOLCHAIN_VARIANT=""
CONFIG_KERNEL_BIN_NAME="zephyr"
CONFIG_PRINTK=y
CONFIG_OMIT_FRAME_POINTER=y
CONFIG_SYS_LOG=y
CONFIG_SYS_LOG_SHOW_TAGS=y
CONFIG_SYS_LOG_DEFAULT_LEVEL=0
CONFIG_SYS_LOG_OVERRIDE_LEVEL=0
CONFIG_ZTEST=y
CONFIG_ZTEST_STACKSIZE=1000
CONFIG_ZTEST_ASSERT_VERBOSE=1
zephyrbot commented 7 years ago

by Chuck Jordan:

yes DRAM starts at 0x10000000. 128MB There is no SRAM on this target, but the symbols in Kconfig are SRAM only at this time.

Everything you've done looks correct. I'm not familiar with this tcf command you are invoking. When I run this test case on EM7D, it seems to work just fine. Interesting that your latest "tcf console-read..." shows the power-up banner, but not the output from the program. Could this be just what was there in the BUFFER at the top only. Do you have a way of running this program w/o your tcf infrastructure?

zephyrbot commented 7 years ago

by Inaky Perez-Gonzalez:

TCF is just a transport at this point, just passing openocd commands around -- I've reduced the problem to load with GDB (that works) vs load with openocd (which doesn't work anymore) -- I don't know if you read my comments in the other bug entry for GH-2796, where I was talking about the details I had found.

Anyway, so using TCF or not is not the problem, but loading with GDB vs loading with OpenOCD.

zephyrbot commented 7 years ago

by Chuck Jordan:

Hi Inaky, I've been only using openocd as server and gdb as client (two separate programs). Do you have something else you are using?

zephyrbot commented 7 years ago

by Mark Linkmeyer:

Correcting the priority field

zephyrbot commented 7 years ago

by Mark Linkmeyer:

Inaky Perez-Gonzalez , will all of the discussion that's happened on this, is this a confirmed and actionable bug to be fixed? If so, please move it to To Do or if it's actively being worked, put it into the In Progress state. Thx.

zephyrbot commented 7 years ago

by Inaky Perez-Gonzalez:

This is the same as GH-2819--I am not in the position of being able to test it now, so it has to be deferred unless someone can test manually.

zephyrbot commented 7 years ago

by Mark Linkmeyer:

Andrew Boie , is this something you can help with? See Inaky's comment above.

zephyrbot commented 7 years ago

by Andrew Boie:

Mark Linkmeyer I need some hardware in order to be able to help with this.

zephyrbot commented 7 years ago

by Inaky Perez-Gonzalez:

There is no need -- Chuck Jordan already verified on his end, as they own the Synopsis HW support. Unless Andrew Boie wants to figure out why OpenOCD cannot load binaries on the EMSK ARC, there is no need to use his time.