kuterd / nv_isa_solver

Nvidia Instruction Set Specification Generator
MIT License
213 stars 10 forks source link

Usage instructions for 4090? #1

Open geohot opened 3 months ago

geohot commented 3 months ago

Trying nv-isa-solver --arch sm_89

Is sharing a disasm_cache.txt okay? I tried deleting it but then it didn't find anything.

kuterd commented 3 months ago

Hello, bootstrapping a disasm_cache.txt is a bit tricky and might be a little broken. We use it for both caching disassembly and discovering instructions.

The way I bootstrap the one for hopper is using the populate_cache script on a 128 core machine which took 2 hours.

You can use nv-isa-solver-scan to ingest sass disassembly files.

geohot commented 3 months ago

Ahh, so for nv-isa-solver-scan I need example sass files?

Don't see an easy way to run populate cache (there's no alias):

tiny@tiny19:~/build/nv_isa_solver$ python3 nv_isa_solver/populate_cache.py
Traceback (most recent call last):
  File "/home/tiny/build/nv_isa_solver/nv_isa_solver/populate_cache.py", line 1, in <module>
    from .disasm_utils import Disassembler, set_bit_range
ImportError: attempted relative import with no known parent package

Fixed up the imports, now running: python3 populate_cache.py --arch sm_89 --cache_file 4090_cache.txt

kuterd commented 3 months ago

Wait. Can you replace the mainloop with this?

    inst = []
    for i in range(pow(2, 12)):
        array = bytearray(b"\0" * 16)
        set_bit_range(array, 0, 12, i)
        inst.append(array)
        for j in range(13, 8 * 13):
            array_ = bytearray(array)
            flip_bit(array_, j)
            inst.append(array_)

array = bytearray(b"\0" * 16) works better IIRC. Some instructions don't like read write barriers being set.

geohot commented 3 months ago

It finished as is:

tiny@tiny19:~/build/nv_isa_solver/nv_isa_solver$ python3 populate_cache.py --arch sm_89 --cache_file 4090_cache.txt
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5888/5888 [08:01<00:00, 12.23it/s]
tiny@tiny19:~/build/nv_isa_solver/nv_isa_solver$ ls -l 4090_cache.txt 
-rw-rw-r-- 1 tiny tiny 14319616 Jul  8 20:40 4090_cache.txt
kuterd commented 3 months ago

Pushed a fix

geohot commented 3 months ago

Kk, retrying

tiny@tiny19:~/build/nv_isa_solver$ nv-isa-solver-populate-cache --arch sm_89 --cache_file 4090_cache_2.txt
  1%|█                                                                                                                                                      | 40/5888 [00:03<08:45, 11.13it/s]
kuterd commented 3 months ago

You might have issues with operand interaction analysis which needs cubin file creation. We currently hard code 90 for SM90a

kuterd commented 3 months ago

Ok, I added a --arch_code to the instruction_solver.py you probably need to use --arch_code 89 but not 100% sure.

geohot commented 3 months ago
tiny@tiny19:~/build/nv_isa_solver$ ls -l 4090_cache_2.txt 
-rw-rw-r-- 1 tiny tiny 14319616 Jul  8 21:11 4090_cache_2.txt

tiny@tiny19:~/build/nv_isa_solver$ nv-isa-solver --arch sm_89 --cache_file 4090_cache_2.txt 
No new instruction found, exiting

tiny@tiny19:~/build/nv_isa_solver$ nv-isa-solver --arch sm_89 --cache_file 4090_cache_2.txt --arch_code 89
No new instruction found, exiting
kuterd commented 3 months ago

You need to use --arch SM89 .... not --arch sm_89

kuterd commented 3 months ago

FYI, Analysing SM89 in my 128 core machine. Will get back to you in a few hours.

kuterd commented 3 months ago

Thanks for your patience. This repo is still very experimental. Human Readable ISA Spec For 4090

machine readable isa.json

To reproduce

cuobjdump --dump-sass --gpu-architecture sm_89 libcublasLt.so.12.5.3.2 > libcublasLt.sass
nv-isa-solver-scan --arch SM89 --cache_file 4090_cache.txt libcublasLt.sass
nv-isa-solver-populate-cache --arch SM89  --cache_file 4090_cache.txt
nv-isa-solver-mutate --arch SM89 --cache_file 4090_cache.txt
nv-isa-solver --arch SM89 --arch_code 89  --cache_file 4090_cache.txt  --num_parallel 5
nv-isa-solver-mutate --arch SM89 --cache_file 4090_cache.txt
nv-isa-solver --arch SM89 --arch_code 89  --cache_file 4090_cache.txt  --num_parallel 5

I will integrate nv-isa-solver-mutate into the main solver itself tomorrow so that you don't have to run it like this multiple times.