Closed anarsoul closed 6 years ago
Hi @yuq,
I'm looking into implementing texture support. So far I have an issue with PP compiler.
The problem is that texture fetch instruction takes coords from pipeline and outputs to pipeline, so in order to read from texture we need at least 3 instructions in row:
1) fetch varying to fetch coords 2) texture fetch instruction 3) any ALU instruction that allows to mov pipeline reg to any other reg
See PP pipeline here: https://raw.githubusercontent.com/robclark/ducking-octo-happiness/master/lima/limapipelinehorizontal2.png
So I'm not sure how to generate 3 instructions in row for a single nir_tex_instr.
Could you give me a hint please?
Hi, I think you can
dup load_varying for load_texture in step 3 may also be done in step 2 and all load_texture input can be load_tex_coord.
Hm, looks like there's some issue with register allocation, see:
lower:
========prog========
-------block------
st_col 5 new
mul 4 ssa4
ld_var 1 ssa1
ld_tex 3 ssa3
ld_coords 2 ssa2
====================
after node_to_instr it becomes:
========prog========
-------block------
st_col 5 new
mov 6 new
mul 4 ssa4
ld_var 1 ssa1
mov 7 new
ld_tex 3 ssa3
ld_coords 2 ssa2
====================
======ppir instr list======
vary texl unif vmul smul vadd sadd comb stor const0|1
*000: null null null 4 null 6 null null null |
001: 1 null null null null null null null null |
002: 2 3 null null null 7 null null null |
-----------------------
But register allocation result is:
======ppir regalloc result======
001: (1|0|)
002: (2|72|) (3|56|) (7|0|56)
000: (4|64|0 0) (6|0|64)
--------------------------
Note that mov for ld_tex stores to reg0 in instr2 and so does ld_var in instr1.
Any ideas what could be wrong?
Nevermind, I found a bug in my code.
Seems you'll always create a mov for ld_tex even it has only one usage. Please consider insert the ld_tex directly into its only one usage alu node instr (in this case the node 4 and instr 0) to save a reg alloc.
@yuq, one step at a time. I'm trying to get it working first. Now I'm stuck with mul inputs. Now reg allocator assignes output reg for mov for ld_var correctly, but mul input is not correct.
========prog========
-------block------
st_col 5 new
mov 6 new
mul 4 ssa4
ld_var 1 ssa1
mov 7 new
ld_tex 3 ssa3
ld_coords 2 ssa2
====================
======ppir regalloc result======
001: (1|0|)
002: (2|72|) (3|56|) (7|4|56)
000: (4|64|0 0) (6|0|64)
--------------------------
Looks like mul srcs are always ppir_target_type_ssa with index = 0. Could you point me to the place where compiler assignes indexes for src ssas?
src ssa is just a pointer reference the dest ssa, so in your case seems the src ssa not point to the mov dest ssa (as the dest ssa is already 4). Please check your mov node create code to see if you have replaced the mul src to the mov dest.
BTW. I usually did this by ppir_node_replace_child or ppir_node_replace_succ
Thanks!
Thanks, I got codegen working. WIP branch is here: https://github.com/anarsoul/mesa-lima/commits/lima-17.3-textures
Now I'm working on command stream setup. I hope that I'll get initial version working by the end of year :)
Glad to hear that.
Just FYI: I found a bug in pp/codegen.c: bitcopy(). It doesn't increment pointer if dst_offset is > 32. As result, it overwrites previous instruction.
Looks like it could be fixed by adding 'dst += (dst_offset >> 3);' at the beginning of this function.
Right, I'll fix it soon.
Fixed in https://github.com/yuq/mesa-lima/commit/b31af110338b059c6bb53349bc188eecf85fda37
Seems only the memcpy path has the problem.
Thanks, codegen seems to work fine now according to disassembly (btw, I believe we really need disassembler in lima driver):
varying[0].xy,
^texture = sampler2D(0),
$1 = ^texture,
sync; # sched = 4 writeback
$0 = varying[1]; # sched = 6 writeback
^vmul = $1 * $0,
$0 = ^vmul,
stop;
I'm stuck with command stream setup. Looks like it doesn't event attempt to read textures - even if I set it to a pretty random address I see no mmu fault. Looks like sampler always reads zero -- cube is completely black.
Could you please dump 'kmscube -M rgba' using proprietary driver for me? I have no proprietary driver installed and it'll take a while to setup it up since I've never done it before on this board.
Sorry, I can't dump kmscube directly either. Because proprietary driver doesn't support GBM/KMS which is needed by kmscube -M rgba. You need to modify it for being able to run within the X11 and not use GBM. And I recommend you setup such a proprietary driver environment (may be easily done by just using the official OS image of the board) to be able to do some experiment to reverse engineering.
2017年12月31日 21:43于 Qiang Yu notifications@github.com写道:
Sorry, I can't dump kmscube directly either. Because proprietary driver doesn't support GBM/KMS which is needed by kmscube -M rgba. You need to modify it for being able to run within the X11 and not use GBM. And I recommend you setup such a proprietary driver environment (may be easily done by just using the official OS image of the board) to be able to do some experiment to reverse engineering.
The wayland variant of blob should support GBM.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
And I recommend you setup such a proprietary driver environment (may be easily done by just using the official OS image of the board) to be able to do some experiment to reverse engineering.
Maybe someone will find it usefull. Here is my modified boot.cmd (from official armbian image, orange pi pc). If it finds /boot/.lima file, it'll boot mainline kernel from /boot/lima/, othervise it'll boot official 3.4 kernel with mali binary driver. So switching between lima and mali is just a matter of touch/rm of that file and reboot.
Thanks @Icenowy, I'll try to get wayland driver working on mainline kernel.
@Icenowy unfortunately Mali driver for Wayland doesn't implement gbm_bo_map, so kmscube -M rgba won't work with it :(
I think you can modify the kmscube -M rgba to use normal glTexImage2D like this: http://www.opengl-tutorial.org/beginners-tutorials/tutorial-5-a-textured-cube/
So that you can use the X11 mali driver for the dump.
@yuq already done, see https://github.com/anarsoul/kmscube/commit/4067f3d8a58146bdeb4e1c40d3c38d463f95a942
I'm using wayland driver btw, do I don't need X11
2018年1月2日 09:47于 Qiang Yu notifications@github.com写道:
I think you can modify the kmscube -M rgba to use normal glTexImage2D like this: http://www.opengl-tutorial.org/beginners-tutorials/tutorial-5-a-textured-cube/
So that you can use the X11 mali driver for the dump.
P.S. do your dump program support any version of blob?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
mali-syscall-tracker should only support newer version of mali driver (r4p0) which uses different ioctl than the old one. The original lima-ng syscall tracker (in wrap dir) can support older versions.
@yuq I'm running r6p0 and looks like mali-syscall-tracker doesn't support it. Dumps don't look sane.
Oh, maybe r6p0 changes the ioctl too. My OrangePI uses r4p0, so mali-syscall-tracker is written for it.
Yeah, it changes ioctls, I compared mali_utgard_ioctl.h for r4p0 and r6p0 and they differ a lot.
Guess I have to rewrite mali-syscall-tracker for r6p0 :)
another option is using r4p0
2018年1月2日 14:56于 Qiang Yu notifications@github.com写道:
another option is using r4p0
For 64-bit this not an option -- no blob exists.
I think the development device of anarsoul should be a Pinebook.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@Icenowy actually it's Pine64 LTS, but it doesn't matter. I run pretty same software on Pinebook and Pine64 LTS, only bootloader and dtb differ. It's a bit easier with Pine64 LTS, since it has gigabit ethernet port and I'm booting over NFS. I have USB to Ethernet adapter for Pinebook, but it's only 100MBit.
You're right - it's 64 bit, so running r4p0 is not an option.
OK, I didn't know that. If you want a quick dump, I can run your ported kmsbube -M rgba on my X11 OrangePI.
It still depends on libgbm, so I'm not sure whether it'll work on X11. The trick is that 64-bit mali driver for Wayland provides its own libgbm implementation.
Got initial version working (RGBA textures only for now, and a lot of values are hardcoded) I dumped texture descriptor and turns out I made some typos (and used logical or instead of bit or). There's some distortion I can't explain yet, see https://photos.app.goo.gl/U9pJlm3gOWbNDXL92
Some lines break and become thicker or thinner for some reason. All the lines should be straight.
Nice to hear that. I can't see the photo as I have some network problem now, maybe you can send it to gist which I can access always.
It's actually video. But you can test the code by yourself - I pushed my hackish code to https://github.com/anarsoul/mesa-lima/commits/lima-17.3-textures
I think something could be wrong with vertex shader compiler, since I compared pp code (almost the same, one extra instruction due to extra mov) and texture descriptor (totally the same) with binary driver, so I believe the only part that differs is vertex shader.
I'll work on cleaning up the code over the week.
I've run your code and see strange texture result too: https://drive.google.com/open?id=1jSdL579VjO0W24UwGiM8WRNGGbdEFai6 It seems two triangles in the same face whose texture mismatch in the share edge.
But use LIMA_DUMP_COMMAND_STREAM to show the vertex shader result, it's correct for the texture coord (kmscube has no calculation for tex coord, so it's hardly to be wrong).
@yuq but vertex shader passes texture coordinates to fragment shader, maybe there's an issue in storing varyings? Or in fetching attributes?
It can't be the attributes fetching, because the tex coord result is correct, attribute fetching happens before that. As the varying store, the result seems as expected in the code (from lima-ng), but you can compare it with the mali binary dump. If the storing is wrong, I think the cube display would be obviously wrong, but currently we can only observe a small texture bias.
Any idea what could be wrong?
I‘ve no idea either expect compare the command stream, GP/PP frame, render state with the binary mali driver dump. Oh, it reminds me that if the glPosition calculation is slightly wrong, the texture result will look like this too. We can compare the glPosition result with the binary mali driver to see if there's any difference (or with hand on calculation).
What's wrong about glPosition?
I mean the vertex shader output for glPosition which is the vertex position (x, y, z, w) which will be used by fragment shader to interpolate the varying and fragment pixel. If the position is calculated wrong (by vertex shader) with small bias, the texture will look like with some bias too.
I'll check the glPosition result first and see if there's such bias. If so, there must be some bug in the vertex shader. You may focus on your code clean up for now.
BTW. could you send me a mali binary dump with draw_cube_tex(0)?
@yuq, see 'Output transformation' section in https://limadriver.org/Lima+ISA/
The compiler implements some transforms internally to convert the value calculated for gl_Position in the shader to the actual value sent to the hardware. In particular (in pseudocode):
uniform vec4 gl_mali_ViewportTransform[2];
gl_position_actual.w = clamp(1.0 / gl_Position.w, -1e10, 1e10);
gl_position_actual.xyz = gl_Position.xyz * gl_position_actual.w * gl_mali_ViewportTransform[0].xyz + gl_mali_ViewportTransform[1].xyz;
And I can't find where it's done anywhere in GP compiler.
It's done in NIR compiler, see: https://github.com/yuq/mesa-lima/commit/f54ae33e81297b32d4ea596bc22a4be80a4a91bf
But I removed the gl_position_actual.w calculation because I always see 1 for it.
Here's dump from proprietary driver. Please note that I haven't added proper r6p0 support to mali-syscall-tracker, so only useful contents is memory...
https://drive.google.com/file/d/1Q76QlfbSkZtmOTs-xp33NlpSjY5axEmK/view?usp=sharing
As per wiki, both PP compiler and command stream need be changed