yuq / mesa-lima

Deprecated, new place: https://gitlab.freedesktop.org/lima
https://github.com/yuq/mesa-lima/wiki
165 stars 17 forks source link

Implement texture support #12

Closed anarsoul closed 6 years ago

anarsoul commented 6 years ago

As per wiki, both PP compiler and command stream need be changed

anarsoul commented 6 years ago

Hi @yuq,

I'm looking into implementing texture support. So far I have an issue with PP compiler.

The problem is that texture fetch instruction takes coords from pipeline and outputs to pipeline, so in order to read from texture we need at least 3 instructions in row:

1) fetch varying to fetch coords 2) texture fetch instruction 3) any ALU instruction that allows to mov pipeline reg to any other reg

See PP pipeline here: https://raw.githubusercontent.com/robclark/ducking-octo-happiness/master/lima/limapipelinehorizontal2.png

So I'm not sure how to generate 3 instructions in row for a single nir_tex_instr.

Could you give me a hint please?

yuq commented 6 years ago

Hi, I think you can

  1. generate a single load_texture ppir_node for nir_text_instr in nir.c
  2. lower the load_texture node to have a load_varying (maybe a new node type load_tex_coord) if it does not get its coord from a load_varying node but some ALU node output in lower.c
  3. insert load_texture node to its usage instr directly if only one instr will use it; create a new instr to insert it if the previous insertion fail (due to another load_texture node has already been inserted) or there're multi instr (not node) usage and create a move in the instr; for load_tex_coord, just insert it to the load_texture instr; for load_varying may need to duplicate one to insert to load_texture instr if it has multi usage instr. This is in node_to_instr.c
yuq commented 6 years ago

dup load_varying for load_texture in step 3 may also be done in step 2 and all load_texture input can be load_tex_coord.

anarsoul commented 6 years ago

Hm, looks like there's some issue with register allocation, see:

lower:
========prog========
-------block------
st_col 5 new
  mul 4 ssa4
    ld_var 1 ssa1
    ld_tex 3 ssa3
      ld_coords 2 ssa2
==================== 

after node_to_instr it becomes:

========prog========
-------block------
st_col 5 new
  mov 6 new
    mul 4 ssa4
      ld_var 1 ssa1
      mov 7 new
        ld_tex 3 ssa3
          ld_coords 2 ssa2
====================

======ppir instr list======
      vary texl unif vmul smul vadd sadd comb stor const0|1
*000: null null null 4    null 6    null null null |
 001: 1    null null null null null null null null |
 002: 2    3    null null null 7    null null null |
-----------------------

But register allocation result is:

======ppir regalloc result======
001: (1|0|)
002: (2|72|) (3|56|) (7|0|56)
000: (4|64|0 0) (6|0|64)
--------------------------

Note that mov for ld_tex stores to reg0 in instr2 and so does ld_var in instr1.

Any ideas what could be wrong?

anarsoul commented 6 years ago

Nevermind, I found a bug in my code.

yuq commented 6 years ago

Seems you'll always create a mov for ld_tex even it has only one usage. Please consider insert the ld_tex directly into its only one usage alu node instr (in this case the node 4 and instr 0) to save a reg alloc.

anarsoul commented 6 years ago

@yuq, one step at a time. I'm trying to get it working first. Now I'm stuck with mul inputs. Now reg allocator assignes output reg for mov for ld_var correctly, but mul input is not correct.

========prog========
-------block------
st_col 5 new
  mov 6 new
    mul 4 ssa4
      ld_var 1 ssa1
      mov 7 new
        ld_tex 3 ssa3
          ld_coords 2 ssa2
====================

======ppir regalloc result======
001: (1|0|)
002: (2|72|) (3|56|) (7|4|56)
000: (4|64|0 0) (6|0|64)
--------------------------
anarsoul commented 6 years ago

Looks like mul srcs are always ppir_target_type_ssa with index = 0. Could you point me to the place where compiler assignes indexes for src ssas?

yuq commented 6 years ago

src ssa is just a pointer reference the dest ssa, so in your case seems the src ssa not point to the mov dest ssa (as the dest ssa is already 4). Please check your mov node create code to see if you have replaced the mul src to the mov dest.

yuq commented 6 years ago

BTW. I usually did this by ppir_node_replace_child or ppir_node_replace_succ

anarsoul commented 6 years ago

Thanks!

anarsoul commented 6 years ago

Thanks, I got codegen working. WIP branch is here: https://github.com/anarsoul/mesa-lima/commits/lima-17.3-textures

Now I'm working on command stream setup. I hope that I'll get initial version working by the end of year :)

yuq commented 6 years ago

Glad to hear that.

anarsoul commented 6 years ago

Just FYI: I found a bug in pp/codegen.c: bitcopy(). It doesn't increment pointer if dst_offset is > 32. As result, it overwrites previous instruction.

Looks like it could be fixed by adding 'dst += (dst_offset >> 3);' at the beginning of this function.

yuq commented 6 years ago

Right, I'll fix it soon.

yuq commented 6 years ago

Fixed in https://github.com/yuq/mesa-lima/commit/b31af110338b059c6bb53349bc188eecf85fda37

Seems only the memcpy path has the problem.

anarsoul commented 6 years ago

Thanks, codegen seems to work fine now according to disassembly (btw, I believe we really need disassembler in lima driver):

varying[0].xy,
^texture = sampler2D(0),
$1 = ^texture,
sync; # sched = 4 writeback

$0 = varying[1]; # sched = 6 writeback

^vmul = $1 * $0,
$0 = ^vmul,
stop;
anarsoul commented 6 years ago

I'm stuck with command stream setup. Looks like it doesn't event attempt to read textures - even if I set it to a pretty random address I see no mmu fault. Looks like sampler always reads zero -- cube is completely black.

Could you please dump 'kmscube -M rgba' using proprietary driver for me? I have no proprietary driver installed and it'll take a while to setup it up since I've never done it before on this board.

yuq commented 6 years ago

Sorry, I can't dump kmscube directly either. Because proprietary driver doesn't support GBM/KMS which is needed by kmscube -M rgba. You need to modify it for being able to run within the X11 and not use GBM. And I recommend you setup such a proprietary driver environment (may be easily done by just using the official OS image of the board) to be able to do some experiment to reverse engineering.

Icenowy commented 6 years ago

2017年12月31日 21:43于 Qiang Yu notifications@github.com写道:

Sorry, I can't dump kmscube directly either. Because proprietary driver doesn't support GBM/KMS which is needed by kmscube -M rgba. You need to modify it for being able to run within the X11 and not use GBM. And I recommend you setup such a proprietary driver environment (may be easily done by just using the official OS image of the board) to be able to do some experiment to reverse engineering.

The wayland variant of blob should support GBM.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

PabloPL commented 6 years ago

And I recommend you setup such a proprietary driver environment (may be easily done by just using the official OS image of the board) to be able to do some experiment to reverse engineering.

Maybe someone will find it usefull. Here is my modified boot.cmd (from official armbian image, orange pi pc). If it finds /boot/.lima file, it'll boot mainline kernel from /boot/lima/, othervise it'll boot official 3.4 kernel with mali binary driver. So switching between lima and mali is just a matter of touch/rm of that file and reboot.

anarsoul commented 6 years ago

Thanks @Icenowy, I'll try to get wayland driver working on mainline kernel.

anarsoul commented 6 years ago

@Icenowy unfortunately Mali driver for Wayland doesn't implement gbm_bo_map, so kmscube -M rgba won't work with it :(

yuq commented 6 years ago

I think you can modify the kmscube -M rgba to use normal glTexImage2D like this: http://www.opengl-tutorial.org/beginners-tutorials/tutorial-5-a-textured-cube/

So that you can use the X11 mali driver for the dump.

anarsoul commented 6 years ago

@yuq already done, see https://github.com/anarsoul/kmscube/commit/4067f3d8a58146bdeb4e1c40d3c38d463f95a942

I'm using wayland driver btw, do I don't need X11

Icenowy commented 6 years ago

2018年1月2日 09:47于 Qiang Yu notifications@github.com写道:

I think you can modify the kmscube -M rgba to use normal glTexImage2D like this: http://www.opengl-tutorial.org/beginners-tutorials/tutorial-5-a-textured-cube/

So that you can use the X11 mali driver for the dump.

P.S. do your dump program support any version of blob?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

yuq commented 6 years ago

mali-syscall-tracker should only support newer version of mali driver (r4p0) which uses different ioctl than the old one. The original lima-ng syscall tracker (in wrap dir) can support older versions.

anarsoul commented 6 years ago

@yuq I'm running r6p0 and looks like mali-syscall-tracker doesn't support it. Dumps don't look sane.

yuq commented 6 years ago

Oh, maybe r6p0 changes the ioctl too. My OrangePI uses r4p0, so mali-syscall-tracker is written for it.

anarsoul commented 6 years ago

Yeah, it changes ioctls, I compared mali_utgard_ioctl.h for r4p0 and r6p0 and they differ a lot.

anarsoul commented 6 years ago

Guess I have to rewrite mali-syscall-tracker for r6p0 :)

yuq commented 6 years ago

another option is using r4p0

Icenowy commented 6 years ago

2018年1月2日 14:56于 Qiang Yu notifications@github.com写道:

another option is using r4p0

For 64-bit this not an option -- no blob exists.

I think the development device of anarsoul should be a Pinebook.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

anarsoul commented 6 years ago

@Icenowy actually it's Pine64 LTS, but it doesn't matter. I run pretty same software on Pinebook and Pine64 LTS, only bootloader and dtb differ. It's a bit easier with Pine64 LTS, since it has gigabit ethernet port and I'm booting over NFS. I have USB to Ethernet adapter for Pinebook, but it's only 100MBit.

You're right - it's 64 bit, so running r4p0 is not an option.

yuq commented 6 years ago

OK, I didn't know that. If you want a quick dump, I can run your ported kmsbube -M rgba on my X11 OrangePI.

anarsoul commented 6 years ago

It still depends on libgbm, so I'm not sure whether it'll work on X11. The trick is that 64-bit mali driver for Wayland provides its own libgbm implementation.

anarsoul commented 6 years ago

Got initial version working (RGBA textures only for now, and a lot of values are hardcoded) I dumped texture descriptor and turns out I made some typos (and used logical or instead of bit or). There's some distortion I can't explain yet, see https://photos.app.goo.gl/U9pJlm3gOWbNDXL92

Some lines break and become thicker or thinner for some reason. All the lines should be straight.

yuq commented 6 years ago

Nice to hear that. I can't see the photo as I have some network problem now, maybe you can send it to gist which I can access always.

anarsoul commented 6 years ago

It's actually video. But you can test the code by yourself - I pushed my hackish code to https://github.com/anarsoul/mesa-lima/commits/lima-17.3-textures

I think something could be wrong with vertex shader compiler, since I compared pp code (almost the same, one extra instruction due to extra mov) and texture descriptor (totally the same) with binary driver, so I believe the only part that differs is vertex shader.

I'll work on cleaning up the code over the week.

yuq commented 6 years ago

I've run your code and see strange texture result too: https://drive.google.com/open?id=1jSdL579VjO0W24UwGiM8WRNGGbdEFai6 It seems two triangles in the same face whose texture mismatch in the share edge.

But use LIMA_DUMP_COMMAND_STREAM to show the vertex shader result, it's correct for the texture coord (kmscube has no calculation for tex coord, so it's hardly to be wrong).

anarsoul commented 6 years ago

@yuq but vertex shader passes texture coordinates to fragment shader, maybe there's an issue in storing varyings? Or in fetching attributes?

yuq commented 6 years ago

It can't be the attributes fetching, because the tex coord result is correct, attribute fetching happens before that. As the varying store, the result seems as expected in the code (from lima-ng), but you can compare it with the mali binary dump. If the storing is wrong, I think the cube display would be obviously wrong, but currently we can only observe a small texture bias.

anarsoul commented 6 years ago

Any idea what could be wrong?

yuq commented 6 years ago

I‘ve no idea either expect compare the command stream, GP/PP frame, render state with the binary mali driver dump. Oh, it reminds me that if the glPosition calculation is slightly wrong, the texture result will look like this too. We can compare the glPosition result with the binary mali driver to see if there's any difference (or with hand on calculation).

anarsoul commented 6 years ago

What's wrong about glPosition?

yuq commented 6 years ago

I mean the vertex shader output for glPosition which is the vertex position (x, y, z, w) which will be used by fragment shader to interpolate the varying and fragment pixel. If the position is calculated wrong (by vertex shader) with small bias, the texture will look like with some bias too.

yuq commented 6 years ago

I'll check the glPosition result first and see if there's such bias. If so, there must be some bug in the vertex shader. You may focus on your code clean up for now.

BTW. could you send me a mali binary dump with draw_cube_tex(0)?

anarsoul commented 6 years ago

@yuq, see 'Output transformation' section in https://limadriver.org/Lima+ISA/

The compiler implements some transforms internally to convert the value calculated for gl_Position in the shader to the actual value sent to the hardware. In particular (in pseudocode):

uniform vec4 gl_mali_ViewportTransform[2];
gl_position_actual.w = clamp(1.0 / gl_Position.w, -1e10, 1e10);
gl_position_actual.xyz  = gl_Position.xyz * gl_position_actual.w * gl_mali_ViewportTransform[0].xyz + gl_mali_ViewportTransform[1].xyz;

And I can't find where it's done anywhere in GP compiler.

yuq commented 6 years ago

It's done in NIR compiler, see: https://github.com/yuq/mesa-lima/commit/f54ae33e81297b32d4ea596bc22a4be80a4a91bf

But I removed the gl_position_actual.w calculation because I always see 1 for it.

anarsoul commented 6 years ago

Here's dump from proprietary driver. Please note that I haven't added proper r6p0 support to mali-syscall-tracker, so only useful contents is memory...

https://drive.google.com/file/d/1Q76QlfbSkZtmOTs-xp33NlpSjY5axEmK/view?usp=sharing