sampsyo / cs6120

advanced compilers
https://www.cs.cornell.edu/courses/cs6120/2023fa/
MIT License
749 stars 158 forks source link

Project 4 Proposal: Atom Autoscheduler #146

Closed g-c-c closed 4 years ago

g-c-c commented 4 years ago

What will you do?

We will develop a Halide autoscheduling pass for accelerating 4DSTEM analysis. Four-dimensional Scanning Transmission Electron Microscopy (4DSTEM) lets you see atoms better than ever before. Sadly, there's a lot of data which existing image analysis is not built for. A single image is several gigabytes; at this rate a single experiment will amount to petabytes of data.

Optimizing data flow with Halide has the additional complexity of dimensionality. Each STEM "image" is a 2D real-space array of 2D fourier-space arrays. Sometimes these are gathered with extra dimensions of time or rotation. To make experiments and image processing feasible, we will implement a pipeline for running existing algorithms in Halide, with automatic scheduling of operations.

How will you do it?

Our first task is to have Halide successfully schedule 4DSTEM algorithms. Halide works great for normal 2D pictures, yet some types of 4DSTEM image analysis must look at the full 4D or 5D dataset. We will start with a toy dataset of the right dimensionalities, then scale to real dat.

Given the many analysis modules, we will initially choose a simple algorithm for Halide to work with. Should scheduling proceed successfully, more complex problems may be solved.

How will you empirically measure success?

As our main goal is to attain reasonable data processing speeds, we will measure the speedup. We will test a variety of 4D image analysis algorithms with our custom-constructed Halide scheduler. We will test the speedup gain on a variety of architectures.

Team members: @rolph-recto @hackedy

sampsyo commented 4 years ago

Sounds cool! I think the biggest risk is that Halide might not actually be able to express the algorithms you want. So please write those programs first to make sure you can do something.

About "a variety of architectures": I think one hardware target is fine, as long as it's not a single-core CPU. Just compare the speed of a hand-scheduled implementation to the one your automated scheduler generates.

I would also be interested to hear any thoughts y'all have about how you will search for schedules. Randomly? Exhaustively? Heuristically? Solver-aided-ly?

sampsyo commented 4 years ago

Closed in #182.