Reconfigurable hardware platform support

Sorry, if this is an offtopic, i'll move it to maillist if it's so. This is just common suggestions about "full metal runtime.js/nodejs" related to this #42 and this #43 issues.

Besides all features we all like JS because it is dynamic language. We can modify runtime environment during programm execution, create functionality on the fly, modify it, pass functions as parameters to the functions and so on. And now JS-driven technologies comes close to hardware layer of computing systems. Why not to try to combine dynamic software with dynamic hardware? I mean such technologies as FPGA (Field-programmable gate array) and PSoC (Programmable System on a Chip) i.e. Reconfigurable computing.

First I would like to quotate article about Intel experience in using combined server systems with Intel Xeon and FPGA modules: Intel unveils new Xeon chip with integrated FPGA, touts 20x performance boost.

Late yesterday, Intel quietly announced one of the biggest ever changes to its chip lineup: It will soon offer a new type of Xeon CPU with an integrated FPGA. This new Xeon+FPGA chip will fit in the standard E5 LGA2011 socket, but the integrated FPGA will allow each chip to be customized to specific workloads.

What’s the purpose of this new Xeon+FPGA product? In the words of Intel: “The FPGA provides our customers a programmable, high performance coherent acceleration capability to turbo-charge their critical algorithms.” Intel estimates that the Xeon+FPGA will see massive performance boosts in the 20x range (for code executed on the FPGA instead of a conventional x86 CPU — but obviously there will be big overall speedups as bottlenecks are removed. The other advantage is that workloads change — so if your critical algorithms change, or your whole company pivots, the FPGA can be repurposed without having to buy lots of new hardware.

So, for server applications this approach shows very good experimental results. With help of reconfigurable and programmable hardware we can implement device drivers, media codecs, network protocols, OS modules, crytical applications algorythms, additional devices and many other things.

Probably it is possible to make even "full-metal" V8 engine as CPU core, but probably this has no much sense, because hardware processor and dynamic interpreter is quite different things. I've herd about hardware implementations of systems like Forth, for example (it is also programming language, interpreter and operating system in one), when forth interpreter implemented as CPU core, but, looks like this is not very popular application. So it is possible to use all power of programmable MPSoCs in combination with compiled V8 core and additional core and applications modules, implemented programmatically or as a hard logic (as additional PCI devices, for example, or as modules with direct access to builtin CPU cores cache).

Intel used in their combined systems FPGA from Altera. As I can see more suitable for runtime.js system is All Programmable MPSoCs from Xilinx.

There are two series of Programmable SoCs form Xilinx: currently available Zynq-7000 AP SoC and newest series - Zynq® UltraScale+™ MPSoC. The short description with architectures schemas of UltraScale can be found in this article - Xilinx Introduces Zynq UltraScale+ MPSoC with Cortex A53 & R5 Cores, Ultrascale FPGA, technical details can be found here Zynq UltraScale+ MPSoC Product Tables and Product Selection Guide

Here I'll describe in details some Zynq-7000 applications.

Common schema of Xilinx ultrascale architecture:

Zynq-7000 All Programmable SoC Overview

A high-level block diagram of the Zynq-7000 AP SoC is shown in Figure:

zynq-7000

Each ARM Cortex-A9 processor has two 32 KB built-in level-1 caches for instructions and data, respectively. Another 512 KB on-chip level-2 cache is shared by the two processors. The snoop control unit (SCU) maintains the coherency of the caches in the two processors. The interconnection between the processor system and programmable logic is achieved through nine distinct AXI ports. The S_AXI_GP slave ports are typically used by the programmable logic that needs to access the processor system peripherals, while the M_AXI_GP master ports are mainly used by the processor system to access the register maps built in the programmable logic. The S_AXI_HP slave ports provide an efficient way for the programmable logic to access an external DDR memory or the 256 KB on-chip memory, while, for latency sensitive applications, the accelerator coherent port S_AXI_ACP offers direct accesses to the caches via the SCU. For a more detailed description of the Zynq-7000 AP SoC.

Wireless Base Station ZUC Block Cipher Implementation on Zynq-7000 AP SoC

This AP SoCs now available as SoM (System on Module) from Avnet - Zynq-7000 All Programmable SoC Systems:

From my point of view the most suitable for small cloud-enabled server module for runtime.js is MicroZed:

MicroZed™ is a low-cost System-On-Module, or SOM that is based on the Xilinx Zynq®-7000 All Programmable (AP) SoC. In addition to the Zynq-7000 AP SoC, the module contains the common functions and interfaces required to support the core of most SoC designs, including memory, configuration, Ethernet, USB, and clocks. On the bottom side of the module, MicroZed contains two 100-pin I/O headers that provide connection to two I/O banks on the programmable logic (PL) side of the Zynq-7000 AP SoC device. When plugged onto a user designed baseboard or carrier card, these 100-pin connectors provide connectivity between the Zynq-7000 AP SoC PL I/Os and the user circuits on the carrier card. MicroZed also includes onboard power regulation that can support a single 5 V to 12 V input.

Key Features

SoC
- XC7Z010-1CLG400 or XC7Z020-1CLG400
Memory
- 1 GB of DDR3 SDRAM
- 128 Mb of QSPI Flash
- Micro SD card interface
Communications
- 10/100/1000 Ethernet
- USB 2.0
- USB-UART
User I/O (via dual board-to-board connectors)
- 7Z010 Version: 100 User I/O (50 per connector) Configurable as up to 48 LVDS pairs or 100 single-ended I/O
- 7Z020 Version: 115 User I/O (58/57 per connector) Configurable as up to 55 LVDS pairs or 115 single-ended I/O
Other
- 2x6 Digilent Pmod® compatible interface providing 8 PS MIO connections for user I/O
- Xilinx PC4 JTAG configuration port
- PS JTAG pins accessible via Pmod or I/O headers
- 33.33 MHz oscillator
- User LED and push switch
Software
- Linux BSP and reference design
Mechanical
- 4 inches x 2.25 inches (102 mm x 57 mm)

I think this could be very intresting perspective direction of porting runtime.js and nodejs/nodeos, because due to reconfigurable (dynamic) properties of this hardware it is possible to achieve very interesting results in server system performance, similar to Intel's ones.

Zynq-7000 contains Dual-Core 32-bit ARM Cortex-A9, Zynq UltraSCALE carries Quad-Core 64-bit ARM Cortex -A53.

Interesting, although I don't believe it correspond to runtime.js or NodeOS... at least directly. Someone could write a npm module to rewrite fpga and wrap it with some functions so it gets "hardware acelerated Javascript" similar to what's currently done with JIT (only a lower level).

Interesting, didn't know that this existed! Dynamic hardware sounds like something V8 could use internally to speed up JavaScript. Probably not much runtime.js can do here.

@piranna

although I don't believe it correspond to runtime.js or NodeOS... at least directly

Well, this is corresponded to processors architectures supported by runtime.js (ARM) - in other words the same issue, as about porting to Raspberry PI :) Because current available AP SoCs carrying ARM CPU cores. If runtime.js or nodeos/nodejs will support this architecture it will be first step to full support of reconfigurable hardware support.

@piranna

Someone could write a npm module to rewrite fpga and wrap it with some functions so it gets "hardware acelerated Javascript" similar to what's currently done with JIT (only a lower level).

Yes, it's not very easy and it would be great if necessary toolset will be written in runtime.js/nodejs and can be used for reconfiguration of hardware, but for now appropriate tools alredy exists and works well -

System Hardware Definition

The development flow for the Zynq-7000 AP SoC consists of two major steps. The system hardware definition step defines the various components of the system and how they are connected together. The Xilinx Platform Studio (XPS) [Ref 10] is used to configure the main hardware parameters of each component and IP:

Clock source and frequency

AXI interconnect ports and base addresses

Connection to interrupt controllers

Connection to external input/output pins

For Xilinx IP cores, including those generated by Vivado HLS, the interconnections and bus configurations are fully automated and customizable. After the processor system has been defined with XPS, Xilinx PlanAhead™ [Ref 11] can automatically generate a top-level RTL wrapper that can be synthesized by the ISE® or Vivado tools. The resulting BIT file is finally downloaded to the Zynq-7000 AP SoC, which then behaves as a fully customized application-specific device under software control.

@iefserge

Probably not much runtime.js can do here.

Ah, don't worry :) All what is needed from runtime.js is ARM support :)

@iefserge

Interesting, didn't know that this existed! Dynamic hardware sounds like something V8 could use internally to speed up JavaScript.

Yes, exactly, but V8 compiles this code and it remains software. With help of FPGA parts embedded in reconfigurable hardware like Zynq-7000 AP SoC it is possible for example (theoretically) to convert one of runtime.js modules, written in JS, to hardware module which will be connected to runtime.js core via PCI bus, or via mapped memory registers (like additional device or part of CPU - with appropriate performance). As well as additional simple hardware and drivers ("soft hardware") for devices which utilizes only GPIO, for example. And this can be changed after reconfiguring/reloading to another configuration of hard/soft parts of system.

I still think this could be done as a npm module and don't have a direct relationship with runtime.js beyond the fact that it could provide some modules to access the direct hardware (that could also be npm modules available for other Javascript environments...).

@piranna

I still think this could be done as a npm module and don't have a direct relationship with runtime.js beyond the fact that it could provide some modules to access the direct hardware (that could also be npm modules available for other Javascript environments...).

Well, if this convertion from software modules to hardware modules made correctly (in right way in right place, see - Intel unveils new Xeon chip with integrated FPGA, touts 20x performance boost) this is the same, what V8 make with interpreted JS code, when compile it, but twice with twice effect :) Yes, I know, you may say that I'm too greedy, but why not? :D Anyway the common direction of hardware evolution goes in to programmable hardware and universal periferal devices. So this is related to more common issue about what hardware is better first choice for runtime.js porting :)

Another question - how is it difficult to make converting tools a part of runtime.js/nodeos. Yes, good question, for now for this purposes proprietary tools like Vivado used. I think it's not very easy, but I think it's possible :) Vivado High-Level Synthesis, Introduction to the Vivado Integrated Design Suite, Vivado Design Suite page. If you are patient, here the more detailed overview of Vivado features in GUI and CLI modes - Vivado Design Flows Overview (v2013.1).

Ah, sorry, you mean that this should not be part of JS-driven OS? Why? Very usefull option - kind of compiling subsystem, which utilizes features of new type of hardware. Of course it can be a node module as well.

it could provide some modules to access the direct hardware

Yes, and commonly interaction between hardware and applications managed by OS (drivers). So it is logical part of OS.

Anyway high-level programming languages now is very popular design tool for logical circuits:

http://esd.cs.ucr.edu/labs/tutorial/comparator.vhd

---------------------------------------------------
-- n-bit Comparator (ESD book figure 2.5)
-- by Weijun Zhang, 04/2001 
--
-- this simple comparator has two n-bit inputs & 
-- three 1-bit outputs
---------------------------------------------------

library ieee;
use ieee.std_logic_1164.all;

---------------------------------------------------

entity Comparator is

generic(n: natural :=2);
port(   A:  in std_logic_vector(n-1 downto 0);
    B:  in std_logic_vector(n-1 downto 0);
    less:       out std_logic;
    equal:      out std_logic;
    greater:    out std_logic
);
end Comparator;

---------------------------------------------------

architecture behv of Comparator is

begin 

    process(A,B)
    begin
        if (A<B) then 
        less <= '1';
        equal <= '0';
        greater <= '0';
    elsif (A=B) then   
        less <= '0';
        equal <= '1';
        greater <= '0';
    else 
        less <= '0';
        equal <= '0';
        greater <= '1';
    end if;
    end process;

end behv;

---------------------------------------------------

Even if you don't understand 中文 (like me) it's not a big problem, because video clearly explains sandbox tools set for logic circuits prototyping 視窗程式：程式領域漫談 (Node.js, Verilog, Spice)

Actually it's good idea - just create needed hardware platform and port necessary software to it and then test all the new features. Anyway most computer devices interacting with help of network protocols, so it is no matter what exactly software and hardware they has. Ok, need to make some investigations what is need to be done to create custom hardware PSoC card with newer version of Zynq.

Ok, actually this is also interesting, because you can avoid "drivers hell" for significant part of external devices, by implementing this devices programmatically, relying only on their standards and specifications, not on their models, produced by different vendors. Or, for example, by creating simplyfied hardware interfaces to existed hardware (as it's done for ESP8266 SoCs).

Few more quotes, just to illustrate what is going on in this area:

Z-Turn Board F.A.Q.

Question Why does the Z-turn Board HDMI output need IP?

Answer Because although the Ubuntu on Z-turn board has already supported HDMI output, it is using xylon trial version IP and can only work within 30 minutes after booted.

Not only the high definition output needs IP, the graphics output on Z-turn board all needs IP support, because the Zynq series SoC PS part has not integrated with LCD controller.

Definition, overview and catalog of Xilinx IP blocks - here

Intellectual Property (IP) are key building blocks of Xilinx Targeted Design Platforms. Xilinx FPGA devices and tools are architected for easy creation of Plug-and-Play IP; allowing Xilinx and its Alliance Program Members to provide an extensive catalog of cores to address your general and market specific needs. This enables you to focus your design efforts on where you differentiate your product from your competition and accelerate your time to profit.

Open source open cores catalog - http://opencores.com/projects

Usefull information about creation and controlling custom ("random logic") IPs from OS:

http://forums.xilinx.com/t5/Embedded-Linux/Create-a-Linux-Driver-for-a-custom-IP-on-Zynq/td-p/477628

Re: Create a Linux Driver for a custom IP on Zynq

‎06-19-2014 04:49 AM

Problem Description

In many cases it is not necessary to write a custom kernel driver simply to access some memory-mapped registers in a custom IP core. The Userspace IO (UIO) framework even permits simple userspace interrupt handling.

UIO is not a solution in all cases, in particular it currently has the following limitations and restrictions:

It is generally not possible to use UIO for device drivers which need to integrate in core parts of the Linux kernel, for example ethernet drivers, block device and mass storage etc. It is best suited to simple access to 'random logic' IPs It is not possible to perform DMA to or from UIO-managed devices, without a custom kernel driver. Userspace DMA is an advanced topic - please speak to your PetaLogix services engineer to discuss possible solutions

Background

In Linux and other modern operating systems, all access to physical hardware from application programs must be mediated through the kernel. This is to protect the integrity of the hardware from erroneous or malicious application software.

However, this requirement to create a kernel device driver for every possible device access can be time consuming and in some cases unnecessary. The UIO framework was designed to allow a light weight, low overhead access to hardware devices directly from userspace.

Methodology

To use generic UIO, you will need to change the device node's compatible property in the DTS file so the the generic UIO driver can be probed:

Here is an example of a device node in the DTS file:

<INSTANCE_NAME>: <INSTANCE_TYPE>@<BASEADDR> {
    compatible = "generic-uio";
    reg = < BASEADDR ADDR_SIZE >;
    interrupt-parent = <INTR CONTROLLER>;
    interrupts = < INTR_NUM SENSITIVITY >;
} ;

"compatible" : It is used to match the "compatible" of the uio_of_genirq driver in Linux so that the device can be probed. "reg" : the physical base address and the address size "interrupt-parent" : the interrupt controller of the system "interrupts" : the interrupt number of the interrupt presented in the interrupt controller and the interrupt sensitivity type.

E.g.:

sys_gpio: xps-gpio@81400000 {
    #gpio-cells = <2>;
    compatible = "generic-uio";
    gpio-controller ;
    reg = < 0x81400000 0x10000 >;
    xlnx,all-inputs = <0x0>;
    ...
    xlnx,tri-default-2 = <0xffffffff>;
}

Enable the uio_of_genirq (UserIO, Generic IRQ handling) driver in the kernel menuconfig:

"Device Drivers" --->
|- "Userspace I/O drivers" --->
|- <*> Userspace I/O platform driver with generic IRQ handling
|-<*> Userspace I/O OF driver with generic IRQ handling

When the system boots, the UIO device is represented in the filesystem as "/dev/uioN" where N is an incrementing integer value for each seperate UIO device. To uniquely identify which uioN corresponds to a particular device, at system boot time you can look in the sysfs path

/sys/class/uio

and the various virtual files and attributes present in that subdirectory.

For more details on how to use UIO, you can refer to this document in the Linux kernel documentation sources:

linux-2.6.x/Documentation/DocBook/uio-howto.tmpl

Once enabled at the kernel level, writing an application to access and control a 'generic-uio' class device is simple. Here is some sample application code for directly managing a Xilinx GPIO device from userspace, bypassing the usual kernel GPIO driver layer.

/*
* This application reads/writes GPIO devices with UIO.
*
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
#define IN 0
#define OUT 1
#define GPIO_MAP_SIZE 0x10000
#define GPIO_DATA_OFFSET 0x00
#define GPIO_TRI_OFFSET 0x04
#define GPIO2_DATA_OFFSET 0x00
#define GPIO2_TRI_OFFSET 0x04

void usage(void)
{
    printf("*argv[0] -d <UIO_DEV_FILE> -i|-o <VALUE>\n");
    printf(" -d UIO device file. e.g. /dev/uio0");
    printf(" -i Input from GPIO\n");
    printf(" -o <VALUE> Output to GPIO\n");
    return;
}

int main(int argc, char *argv[])
{
    int c;
    int fd;
    int direction=IN;
    char *uiod;
    int value = 0;
    void *ptr;
    printf("GPIO UIO test.\n");
    while((c = getopt(argc, argv, "d:io:h")) != -1) 
    {
        switch(c) 
        {
            case 'd':
            uiod=optarg;
            break;

            case 'i':
            direction=IN;
            break;

            case 'o':
            direction=OUT;
            value=atoi(optarg);
            break;

            case 'h':
            usage();
            return 0;

            default:
            printf("invalid option: %c\n", (char)c);
            usage();
            return -1;
        }
    }

    /* Open the UIO device file */
    fd = open(uiod, O_RDWR);
    if (fd < 1) 
    {
        perror(argv[0]);
        printf("Invalid UIO device file:%s.\n", uiod);
        usage();
        return -1;
    }

    /* mmap the UIO device */
    ptr = mmap(NULL, GPIO_MAP_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    if (direction == IN) 
    {
        /* Read from GPIO */
        *((unsigned *)(ptr + GPIO_TRI_OFFSET)) = 255;
        value = *((unsigned *) (ptr + GPIO_DATA_OFFSET));
        printf("%s: input: %08x\n",argv[0], value);
    } else {
        /* Write to GPIO */
        *((unsigned *)(ptr + GPIO_TRI_OFFSET)) = 0;
        *((unsigned *)(ptr + GPIO_DATA_OFFSET)) = value;
    }
    munmap(ptr, GPIO_MAP_SIZE);
    return 0;
}

Thanks and Regards Balkrishan

Please mark the post as an answer "Accept as solution" in case it helped resolve your query. Give kudos in case a post in case it guided to the solution.

@danxn Yeah, "drivers hell" is a an issue for any new OS, that's why it's easier to target hypervisor like KVM or Xen and just support the virtualized interfaces. You'd usually want to run your services in a cloud anyway instead of setting up a dedicated physical hardware. Specific hardware drivers could be written later if really necessary. I'd love to see a standardized hardware interface supported by all vendors though.

@iefserge

I'd love to see a standardized hardware interface supported by all vendors though.

Yes, this is a big headache and standardized interfaces makes life of developers much easier.

For me in this direction more interesting using of FPGA modules as hardware accelerators of JS code. Now I'm learning this and looks like custom logic hardware cores converted from JS modules is definetely reachable thing. Just interesting how they can speed-up server applications. Very interesting :) Now I'm looking for easy entrance point for this experiments. May be it will be not a runtime.js first, but Linux and C. But with JS OS and JS apps it will be much cooler :)

runtimejs / runtime