sje30 / catam-julia

CATAM material in Julia
http://sje30.github.io/catam-julia
MIT License
21 stars 8 forks source link

list of ideas for improvements from Gokul #4

Open sje30 opened 3 years ago

sje30 commented 3 years ago

hi @jmb280cam

here is a list of ideas for improvements from Gokul @srgk26

1 how automatic differentiation works (not just how it's implemented, as in the video by Alan Edelman)
2 types and type stability; also highlight difference between type annotation and type stability (the former does not improve
 performance)
3 refer to blog for type stability in julia: https://www.juliabloggers.com/writing-type-stable-julia-code/
4 using @code_warntype on function calls to check for type stability
5 custom typing with struct
6 multiple dispatch
7 highlight performance tips page in docs: https://docs.julialang.org/en/v1/manual/performance-tips/
8 using maths macros @simd, @fastmath, @inbounds to further improve performance (not always, can reduce performance -
 case-by-case basis)
9 good idea to introduce docker as well
10 tutorial for common tools and packages in julia ecosystem (dataframes, plotting, flux, differentialequations, etc.)
11 note that julia is column-major, so iterate through rows before columns in matrices, for example:

m = rand(2,3)
@inbounds @views for j in 1:size(m,2)
    for i in 1:size(m,1)
        m[i,j] = m[i,j]*(i+j)
    end
end

some of these could be in the intro, some (like types) would be better off in a case study.

CC: @Nick-Gale for info

sje30 commented 3 years ago

p.s. for iterating over an array (item 11), is eachindex(m) a better way to iterate over the matrix m? Not tried it yet myself

jmbyrne commented 3 years ago

My initial thoughts on these are: 1, 10 may fit in better into case studies 2, 3, 4, 5, 6 could constitute a new section, but more likely a case study (as you say) 7 should probably be mentioned 8, 11 can go into the efficiency case study, whatever that may end up being 9 I don't know what you mean by "docker", could you elaborate please?

A note on eachindex(m), it's described in the Julia manual as

an efficient iterator for visiting each position in A

so it's good for that, but if you need the row/column indices I think it's just as efficient to use two for loops

srgk26 commented 3 years ago

Hi @jmb280cam! Docker is a container platform, sort of like virtual machines but operating on an OS level rather than on kernel level. You can find more info here:

https://en.wikipedia.org/wiki/Docker_(software) https://www.docker.com/

Docker isn't necessarily linked to Julia itself, I was making a more general suggestion to maybe include docker as a scientific computing tool. I certainly use docker all the time. The main purpose of docker is to export software across machines, such that if it works on machine A it is guaranteed to work on machine B. But I tend to use docker just because it's cleaner, compartmentalized, and I don't have to watch out for updates. I just need to do docker pull, and the latest software version is automatically downloaded. But whatever the use case, I think it's useful to know.

Anyways this may very well be complicating things more than necessary, and may very well be out of scope. But since I'm already using it for Julia and have the instructions, I'll provide them here. Feel free to make use of any of it if you'd like or leave it out entirely.

Setting up docker itself differs, depending on if it's Windows, Mac or Linux machine, and if the user has root permissions (they would if it's their personal computer). After it's set up, I would create a new container like this:

JULIA_VERSION="1.6.0"
docker pull julia:latest
docker build -f Julia-CUDA-Dockerfile --no-cache -t julia-cuda:latest .
docker run -it --name julia-"${JULIA_VERSION}" -v /home/srgk26:/home/srgk26 --gpus all julia-cuda:latest

Docker build step: The -f option specified file input. The Dockerfile input I'm using is called Julia-CUDA-Dockerfile (this is with CUDA support). The -t option means tag, this gives the name of the image created.

Docker run step: The -it option means interactive. The --name option gives a name to the container created, can provide any name but a name is useful (will explain later below). In my setup, it's named julia-1.6.0, so I can see what version of julia is for which docker image. The -v option is the filesystem mount option, of the form -v host:container. So the directory to the left of the colon is the directory on my shot machine, the directory to the right of the colon is the directory within the docker container (if directory doesn't exist in container, it'll create it). Normally docker containers are isolated from the rest of the host filesystem. But if want to work with files in the host system though, this is necessary. The --gpus all option means I'm exposing my local GPUs to the docker container.

This is how my dockerfile 'Julia-CUDA-Dockerfile' looks like:

FROM julia:latest

## Specify NVIDIA driver features to mount inside container
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

## Install Linux system packages
RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive \
    apt-get install --yes --no-install-recommends \
                    build-essential curl git libgomp1 sudo wget && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

## Install MKL.jl
## Until Julia v1.7, building MKL.jl rebuilds Julia's system image against Intel MKL
## Until Julia v1.7, need to set environment variable ENV["USE_BLAS64"]=true to install 64-bit MKL version
#RUN julia -e 'ENV["USE_BLAS64"] = true; using Pkg; Pkg.add("MKL")'

## Set user name for container to run as user
ARG USER=julia-docker

## Provide root privileges to $USER
## Add $USER to sudo group and disable password requirement
RUN adduser --disabled-password --gecos '' $USER
RUN adduser $USER sudo
RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers

## Switch user from root to $USER
USER $USER

## Install and precompile essential Julia packages
RUN julia -e 'using Pkg; Pkg.add(["Combinatorics", "CSV", "CUDA", "DataFrames", \
                                  "DifferentialEquations", "Distributions", "Flux", "IterTools", \
                                  "LoopVectorization", "MKL", "Parameters", "StatsFuns", "StatsBase", "StatsPlots"]); \
                         Pkg.precompile()'

## Default executable for container
ENTRYPOINT ["julia"]

The main purpose for doing this is to circumvent the permission issues when working with docker containers. Docker automatically assumes root permission, which means any files created or edited within a docker system will be created by root. This is troublesome when later working with those files outside of docker as a regular user. I'm using plain Linux desktop, so this is a problem. Not sure if it's a problem with WSL2 in windows or Mac though.

In any case, this is a sample Dockerfile that works, can adapt it as per use case.

After creating a container, I can call on that container for future use. I'm using vscode, and in vscode it's just a click on a button. Within terminal though, these are the steps:

docker start julia-general-1.6.0
docker exec -it julia-general-1.6.0 /bin/bash

Anyways, these steps are probably overkill. I already have these instructions, so I sent them to you. Feel free to disregard this.

jmbyrne commented 3 years ago

Thanks for that explanation. I would have thought that for CATAM we wouldn't need to consider this sort of thing (since the code should be simple enough to work on any system), but it's good to have in the back pocket if it comes up

Nick-Gale commented 3 years ago

Just a small technicality on this point. The eachindex iterator will be as efficient as two for loops only if the for loops access the memory in column order i.e. run through the columns in the outer loop.

I’ve attached a small benchmark to show the performance difference when accessing in row major (in this case of summing a large matrix it’s about a 5x overhead).

It’s a pretty easy mistake to make as a beginner especially because in some sense it’s unintuitive (indexes read left to right so the first variable in mind when writing the for loops is the left-most) and is the opposite to C. However, there are good reasons to do it!

Cheers,

Nick

On 21 Jul 2021, at 10:46, jmb280cam @.***> wrote:

My initial thoughts on these are: 1, 10 may fit in better into case studies 2, 3, 4, 5, 6 could constitute a new section, but more likely a case study (as you say) 7 should probably be mentioned 8, 11 can go into the efficiency case study, whatever that may end up being 9 I don't know what you mean by "docker", could you elaborate please?

A note on eachindex(m), it's described in the Julia manual as

an efficient iterator for visiting each position in A

so it's good for that, but if you need the row/column indices I think it's just as efficient to use two for loops

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sje30/catam-julia/issues/4#issuecomment-884051132, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANVSMYIJQBE3UVTIJSRVGS3TY2JPHANCNFSM5AXRES3A.

Nick-Gale commented 3 years ago

I just realised that Gokul provided an example too, d’oh!.

I also read through the document as it stands - it’s very well done!

Cheers,

Nick

On 21 Jul 2021, at 10:46, jmb280cam @.***> wrote:

My initial thoughts on these are: 1, 10 may fit in better into case studies 2, 3, 4, 5, 6 could constitute a new section, but more likely a case study (as you say) 7 should probably be mentioned 8, 11 can go into the efficiency case study, whatever that may end up being 9 I don't know what you mean by "docker", could you elaborate please?

A note on eachindex(m), it's described in the Julia manual as

an efficient iterator for visiting each position in A

so it's good for that, but if you need the row/column indices I think it's just as efficient to use two for loops

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sje30/catam-julia/issues/4#issuecomment-884051132, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANVSMYIJQBE3UVTIJSRVGS3TY2JPHANCNFSM5AXRES3A.

srgk26 commented 3 years ago

Just putting it out here, saw this on my youtube recommendations. Think it's a good link to share: https://youtu.be/gRj7E5kYG1I

Also, @sje30 you may be interested in this, but perhaps out of scope for catam-julia: https://youtu.be/Sh_jBtP7RVY

sje30 commented 3 years ago

thanks. there are also some nice pluto videos there from the JuliaCon

srgk26 commented 3 years ago

Hey! I also came across these last week, wanted to post this then but got distracted by the presentations. Anyway, this is the post: https://www.numerical-tours.com/julia/

Numerical tours in Julia. Not sure if it's interesting/relevant, but sending it across anyway.

sje30 commented 3 years ago

Thanks @srgk26

if you had a spare 30-60 minutes, would you be able to read over https://sje30.github.io/catam-julia/intro/julia-manual.html and comment?

srgk26 commented 3 years ago

Hi @sje30, I had a brief look. Firstly, should say that was a bit weird to see cell output above the code. But it seems that's the way Pluto was designed. A quick couple of points (caveats really):

  1. Under the efficiency section, you mentioned Julia being compiled. But perhaps it's worth also point out that this means that the first run is also slower, which would be more obvious for small operations, or for benchmarking results.
  2. Just a note that Threads.@threads doesn't always speed up computations. From my benchmarking, I observed that it only speeds up for large enough datasets. Otherwise, the hyperthreading overhead is more dominant and slows down than if it were single-threaded.

I'd be able to give a more detailed comment maybe over the weekend. I'm packing up now actually, for a flight to India tomorrow. Will be back in the UK in about a month.

I'd be happy to look into this closer then.

jmbyrne commented 3 years ago

Thanks again for the input Gokul, I'll make those changes