stdlib-js / google-summer-of-code

Google Summer of Code resources.
https://github.com/stdlib-js/stdlib
23 stars 5 forks source link

[RFC]: add BLAS bindings and implementations for linear algebra #76

Closed aman-095 closed 2 months ago

aman-095 commented 3 months ago

Full name

Aman Bhansali

University status

Yes

University name

Indian Institute of Technology, Jodhpur

University program

Mechanical Engineering

Expected graduation

May, 2024

Short biography

I am a final-year undergraduate student in the Department of Mechanical Engineering at the Indian Institute of Technology, Jodhpur. I have completed the following relevant courses in my academic curriculum: Introduction to Computer Science, Data Structures and Algorithms, Linear Algebra and Calculus, and Scientific Computing. I have a strong background in Python 3.x, C, C++11, and Javascript developed during projects and courses. The applications of machine learning pique my curiosity. I have a research paper published at the European Conference on Artificial Intelligence (ECAI-23) and several projects employing computer vision and natural language processing.

Timezone

IST (UTC + 5:30)

Contact details

email:amanbhansali65@gmail.com,bhansali.1@iitj.ac.in,github:@aman-095

Platform

Windows

Editor

I utilize Windows Subsystem for Linux as it allows me to run a Linux environment on my Windows machine. I use VSCode for my code editor. I’ve been using this since the first time I started programming. The reason at that time was that it supported multiple languages, user-friendly interface, and had a lot of extensions for each programming language. Debugging the code is easier on VSCode. Now that I have started contributing to open source, the reason is its EditorConfig extension and the support of Github Copilot, which make it easy and fast to use for getting jobs done.

Programming experience

Since my first year of undergraduate studies, programming has been a pivotal part of my academic journey. I started with Python through my course on Introduction to Computer Science where I gained experience on how a language works, its syntax, and its structure after which I shifted to C++ delving deeper into its object-oriented paradigm. Further, my Data Structures and Algorithms course required C language and hence I learned C during that time and practiced solving problems. One of the invaluable lessons I’ve gained from my programming journey is the universality of core concepts across languages. While syntax may vary, the underlying principles of logic and problem-solving remain constant. This realization has not only enhanced my adaptability but also reinforced the importance of understanding the fundamentals of any language. As I have my multiple projects using python but when I shifted to other languages I felt that syntax varies rest the understanding, and logic remains the same. So, yes I have around 4 years of experience with programming and I feel confident to work on this project as well.lesson

JavaScript experience

Although I do not have any major projects that are solely inclined toward the utilization of JavaScript, I’ve gained an understanding of the language through an online course. Also, my knowledge and experience of programming languages, such as C/C++ and Python, has provided me with a solid foundation that helped me seamlessly translate to JavaScript. My contributions to stdlib have been pivotal in further enhancing my knowledge and practical application of the language. I have gained an understanding of how code flows which I feel is the most important. Contributing to stdlib, I was able to grasp the structure fast, and hence my contributions also signify that. This experience portrays my ability to adapt faster, even when faced with unfamiliar concepts or challenges. I am confident that any gaps in my knowledge of JavaScript I will be able to bridge, and this will not be a blocker for the project.

Node.js experience

The main source of my Node.js experience is the stdlib contributions that I have made. I think that my prior knowledge offers a solid basis for fundamentals and hence picking up the language fast. My understanding of JavaScript development inside the Node.js ecosystem has been enhanced by my contributions to several packages and my implementation of three BLAS Level 1 routines.

C/Fortran experience

I do not have any prior experience working with Fortran but I have a good experience with C language as mentioned above. Recently while working on stdlib/blas/base/drot, stdlib/blas/base/srot, and stdlib/blas/base/zcopy BLAS routines, I had to write C implementation from reference Fortran77 code which seemed easy to me. Looking at the reference implementation of other BLAS routines, I do not think less experience with Fortran may serve as any blocker to this project.

Interest in stdlib

Numerical computations and fundamental linear algebra have always excited me and working with stdlib has provided me an opportunity to pursue them and work for the open source community. This project offers me a chance to explore more of these operations, and their applications in various scientific and engineering domains. Working here will allow me to contribute to a widely used and respected library within the JavaScript ecosystem. After contributing to stdlib, I attest to the punctiliousness that goes into reviewing each pull request (PR), ensuring that the entire codebase appears standardized and that the use of multiple files never gives the impression that different contributors implemented them. Likewise, I strive for perfection in my endeavors, therefore, this also influences and motivates me. Being part of the stdlib community, I believe, means being part of a vibrant ecosystem of developers and contributors. The collaborative nature of open-source development promotes the true spirit of learning, innovating, and sharing, which I find to be highly rewarding and motivating.

Version control

Yes

Contributions to stdlib

Following is the list of Pull Requests that I have made in the past to the GitHub repository of stdlib, PS: I have listed contributions from most recent to least recent, list order does not determine the difficulty of the PR.

Merged:

Open:

A comprehensive list of all my merge requests can be found at:

Goals

In this section I would be discussing the project in detail. Specifically, I will be listing down the abstract, outcomes of the project, project description, current status of work, and approach.

1. Abstract:

This proposal is directed toward adding BLAS (Basic Linear Algebra Subprograms) routines and their C / Fortran / Js implementation for linear algebra. In linear algebra, BLAS routines are predominantly employed to perform operations on matrices and vectors. BLAS are commonly utilized in the development of advanced linear algebra software owing to their efficiency, portability, and extensive accessibility. The major goals associated with the project are:

2. Project Outcome

Once we have the BLAS packages they can be used for vector and matrix operations and can be called using APIs i.e. a comprehensive suite of BLAS routine APIs will be incorporated into stdlib. Users will be able to call BLAS routines from JavaScript. In web browsers, BLAS routines will be in JavaScript. Additionally, the integration of C and Fortran implementations will aim at optimizing performance and enabling potential hardware-based implementations.

3. Project Description

There are three levels of BLAS routines:

Within the hierarchy of BLAS routines, there exist distinct iterations of the same routine distinguished by their data types, denoted through specific conventions. For instance, variations such as SROT, DROT, CSROT, and ZDROT adhere to this convention. Here, the prefixes ’S’, ’D’, ’C’, and ’Z’ respectively signify real single precision, real double precision, complex single precision, and complex double precision. In stdlib we also have 'G' type which is the generic type implementation for each routine. Current Status: I am tracking the status of the implementations via an issue:

4. Approach:

First, I will start with the implementation of Level 1 routines and then move towards Levels 2 and 3. In each level, the priority will be first real, followed by complex datatypes, and further inside real for real double precision, followed by real single precision, and similarly for complex double and single precision. Once these are done I will also implement the 'G' type for each routine. I propose to split the implementation of the complete package into two Pull Requests (PRs) for efficiency and clarity. The first PR would contain the Js implementation of the Fortran code from the reference mentioned above and other required files. Then the second PR will contain the C / Fortran implementation. After a few implementations of routines, I will then use one Pull Request for the entire routine.

The real challenge while implementing a package from scratch lies in creating a comprehensive test suite and standard benchmarks. Javascript being a weakly typed programming language makes it easy to write up the core implementation, and then the implementation can be tested against registered tests and benchmarks.

On the other hand, C and Fortran code demands strict variable types and is prone to error, so once we have a thorough test suite merged via the first PR, we can test the new implementation and be assured that we are on the right track.

Below, I will outline the detailed implementation plan for the ISAMAX routine within the Level 1 BLAS library:

Routine Description:

Based on the notations above discussed, ISAMAX is a single-precision routine. It finds the index of the element having the first maximum absolute value inside the real single precision array. Parameters associated:

function isamax( N, x, strideX )

The above code snippet is the function definition of the native Js implementation for ISAMAX.

var Float32Array = require( '@stdlib/array/float32' );

// Initial array...
var x0 = new Float32Array( [ 1.0, 2.0, -3.0, 5.0, 4.0, -6.0 ] );

// Create offset views...
var x1 = new Float32Array( x0.buffer, x0.BYTES_PER_ELEMENT*1 ); // start at 2nd element

isamax( 3, x1, 2);
// output: 5

In JavaScript, arrays are implemented as objects, and there is no direct access to memory pointers. An ndarray in JavaScript often utilizes a typed array as its underlying buffer. Say we have a typed array buffer with 100 elements and we want to apply any filter over this data instead of creating a new buffer for applying offset we have offset passed as a variable rather than using additional memory.

function isamax( N, x, strideX, offsetX )

The above code snippet is the function definition for the ndarray case. It has four parameters, the new parameter here is the offsetX. By using an offset, we can pass a pointer to the memory location indicated by the offset, effectively passing only the relevant portion of the buffer to the C function without needing to re-instantiate a new object. This function will help us give users the freedom to set offset, which in the normal function is zero by default, i.e., it provides alternative indexing semantics.

int c_isamax( const int N, const float *X, const int strideX ) 

The above code snipet is corresponding to the C signature of the Fortran77 implementation. The definition has a int return type for the index, the parameter X indicating that isamax is a single precision.

int c_isamax( const int N, const float *X, const int strideX )

The above code snippet is the signature for cblas which refers to a C translation of the FORTRAN77 BLAS accessible through the LAPACK linear algebra library. It is only used when we are targeting hardware-optimized libraries such as OpenBlas & Apple accelerators.

Routine Implementation

Now we have two parts of the implementation as discussed for a few routines, first the js implementation of the Fortran77 code as first PR and then the C and Fortran implementation in the second PR. Now from this code, it is crucial to determine whether or not all the intrinsic functions utilized in the Fortran implementation are present in our library prior to working on the routine implementation. If not I will implement that first and then proceed with the work. Here we see that the Fortran code uses abs for which we have javascript and C implementations available thus this won’t be an issue.

Js Implementation

The primary files needed for the JavaScript implementation of the two APIs as previously discussed include:

var Float32Array = require( '@stdlib/array/float32' );

var x = new Float32Array( [ 1.0, 2.0, -3.0, 4.0, 5.0 ] );
isamax( x.length, x, 1 );
// output: 4

x = new Float32Array( [ 1.0, 6.0, -3.0, 4.0, 5.0 ] );
isamax.ndarray( x.length, x, 1, 2 );
// output: 4
function isamax( N, x, strideX, offsetX ) {
    var smax;
    var idx;
    var ix;
    var sx;
    var i;

    idx = -1;
    if ( N < 1 || strideX <= 0 ) {
        return NaN;
    }
    idx = 0;
    if ( N === 1 ) {
        return idx;
    }
    sx = offsetX;
    idx = sx;
    if (strideX === 1 ) {
        // code for increment equal to `1`...
        smax = stdlib_base_abs( x[sx] );
        sx = sx + strideX;
        for ( i = 1; i < N; i++ ) {
            if ( stdlib_base_abs( x[sx] ) > smax ) {
                idx = sx;
                smax = stdlib_base_abs( x[sx] );
            }
            sx = sx + strideX;
        }
        return idx;
    }

    smax = stdlib_base_abs( x[sx] );
    sx = sx + strideX;
    for ( i = 1; i < N; i++ ) {
        if ( stdlib_base_abs( x[sx] ) > smax ) {
            idx = sx;
            smax = stdlib_base_abs( x[sx] );
        }
        sx = sx + strideX;
    }
    return idx;
}

The above code is the ndarray.js implementation.

This can be made for isamax; we can also do the same and get the tests. The other tests could be made based on different cases, i.e., negative strides, complex access patterns, etc., based on requirements. Similarly, tests are made for the Ndarray case. The third file is the test.js file, which checks if all the functions being called and used exist in the package. Similar to other files, the structure will be the same as that used in other stdlib files.

C / Fortran Implementation:

Firstly, I will outline the steps involved in crafting the C implementation, followed by the Fortran counterpart. Given our prior discussions, the focal point lies in the implementation code, and introducing tests at this juncture won’t pose any complications since we already have them prepared for the JS implementation. Furthermore, it is imperative to update our README file, documenting the C APIs and providing examples akin to what we’ve done for JS. This entails adhering to the standard structure and format prevalent in stdlib packages, a familiarity I’ve acquired through my past contributions. Here is the list of required files & folders:

We will proceed by creating the file named isamax.c, which will contain the C implementation for the BLAS routine isamax. This file will include the header file isamax.h, as previously discussed, along with any intrinsic functions necessary, such as here it is abs.

Another file to be created is isaamax cblas.c, where cblas refers to a C translation of the FORTRAN77 BLAS accessible through the LAPACK linear algebra library. This interface is utilized by devices with support for cblas, such as Apple accelerators, to transmit parameters to the cblas function.

Next, we have the Fortran implementation, comprising isamax.f and isamaxsub.f. In Fortran, parameters are typically passed as pointers, and the presence of a sub-function indicates that it is a subroutine, which doesn’t return any value.

isamax.f contains the free-form implementation of the Fortran code. On the other hand, the sub-file, isamaxsub.f, serves as a wrapper aiding in accessing results from C. This implies that isamaxsub.f will include an additional parameter for the maximum index. In Fortran, where everything is passed by reference, this sub-file calls isamax.f and stores the address of the index result.

Furthermore, there’s another file named isamax f.c. This function, in its implementation, introduces a variable for the answer and passes its reference or address to a sub-file. The sub-file employs Fortran code to obtain the result and then assigns it to the address provided by C. This approach is adopted to establish an interface enabling C to pass parameters while leveraging Fortran implementation for calculations, thereby enhancing performance.

int c_isamax( const int N, const float *X, const int strideX ) {
    int ix;
    isamaxsub( &N, X, &strideX, Y, &strideY, &ix );
    return ix;
}

This way, the entire BLAS for ISAMAX will be implemented, as will the other BLAS routines. An important point to note is that all the files should follow the standard convention of files used in stdlib, which I have gained a decent amount of understanding for while implementing the drot, sort, and zcopy BLAS routines.

Why this project?

The proposed project, presents an exciting opportunity to explore and understand the concepts of numerical computation, particularly focusing on fundamental linear algebra operations. The major factors behind my motivation for the project are:

Discussions with Athan and Pranav have further bolstered my confidence and motivation to embark on this project.

Qualifications

The requirements for this project are proficient knowledge of C, JS, and Fortran, knowledge of scientific computing, and understanding of the structure, format, and flow of stdlib files and code. Based on the aforementioned sections pertaining to my experience with C/Fortran and Js, I am confident that I possess the requisite skill set and knowledge of the technology framework necessary for this undertaking. I also took a course on Scientific Computations as I mentioned above which built my interest and knowledge in the domain. As a result of gaining a sufficient comprehension of how computations for these functions operate and the need for high-performance implementations through the completion of this course, I am qualified to begin working on this. Also, I am a quick learner and motivated to learn new things, furthermore, my contributions to stdlib have provided me with insight into the package’s architecture, and through a conversation with Athan, I realized why it is critical to achieve perfection in implementation and why the package’s structure as a whole is significant, not just the implementation. Thus, based on my merged PRs I feel that I am learning to do things the stdlib way.

Prior art

BLAS in a library is essential for efficient numerical computing. A library that makes use of BLAS can make use of highly optimized versions of these operations, which are frequently created and tuned for particular hardware architectures. Also, as I have worked in the domain of ML, the models behind them have a huge usage of linear algebra, and having this can even open doors further.

The traditional fortran77 implementation for BLAS routines is available at lapack-netlib further libraries such as OpenMathLib contain the modern implementation based on the lapack-netlib that helps in efficient and optimized computations. Numpy and Scipy, for example, are libraries that use BLAS to execute operations on matrices and vectors efficiently. For example, dot(), linalg.svd(), etc. are based on BLAS. In stdlib, we have implementations for BLAS Level 1 routines now, with a few of them still remaining. I am tracking the status of all the already-implemented BLAS routines and they will serve as a reference for the structure of the routines that I’ll implement further using the Fortran77 code.

Commitment

In the project goals, as I mentioned, there are a few BLAS routine functions for which the C and Fortran implementations are remaining. In the period before GSOC, I wish to complete these packages entirely. Also, I will try to implement one BLAS Level 2 routine during the GSOC period to gain an overall understanding of how vector and matrix computations work, which will help me during the GSOC period, and I will be able to reduce significant time in the implementation of BLAS Level 2 routines as well. I estimate that I will complete around 6–7 routines of Level 1 before GSOC. I am estimating that I will be easily able to devote around 15 hours per week before the GSOC period working in the direction as written above.

During the GSOC period, I wish to contribute approximately 30 hours per week to work on the project for around 350 hours. As I am a final-year student and I am placed, summer will entirely be free for me as of now. I don’t have any major plans, and hence I will aim to complete as many packages as I can during this period. In the Schedule section of my proposal, I have provided a detailed structure. After the GSOC period, I will not be able to devote the same amount of time as during the period, and since I find this exciting, I will work slowly and try to complete the remaining Level 3 or any remaining routine from another level if remaining after the GSOC period.remain

Schedule

Assuming a 12 week schedule,

Notes:

Related issues

Tracking Issue:

Project Idea Issue:

Checklist

kgryte commented 3 months ago

Thank you for sharing a draft of your proposal, @aman-095. This looks good to me. Feel free to go ahead and submit on the GSoC website.

kgryte commented 3 months ago

The only small comment I'll make is that you'll want to consider the order in which BLAS packages are added, prioritizing real-valued over complex and double over single precision.

And further there are the "g*" variants but these should be straightforward to implement once the double-precision package is implemented.

aman-095 commented 3 months ago

Thanks for the review, @kgryte. I will prioritize this and have also added this to my proposal.