wargio / libvle

PowerPC VLE disassembler library
GNU Lesser General Public License v3.0
5 stars 5 forks source link

Possible bug with e_stb #13

Closed prj closed 3 years ago

prj commented 3 years ago
        { "e_srwi."     , 0x7C000471, 0x7C000471 | E_MASK_X   , E_XRA   , {TYPE_REG, TYPE_REG, TYPE_IMM, TYPE_NONE, TYPE_NONE}},
        { "e_stb"       , 0x34000000, 0x34000000 | E_MASK_D   , E_D     , {TYPE_REG, TYPE_MEM, TYPE_IMM, TYPE_NONE, TYPE_NONE}},
        { "e_stbu"      , 0x18000400, 0x18000400 | E_MASK_D8  , E_D8    , {TYPE_REG, TYPE_MEM, TYPE_REG, TYPE_NONE, TYPE_NONE}},

Shouldn't the third type in the list for e_stb be TYPE_REG?

prj commented 3 years ago

And another one, which luckily does not affect functionality! { "e_lis" , 0x7000E000, 0x7000E000 | E_MASK_I16L, E_I16LS , {TYPE_REG, TYPE_IMM, TYPE_IMM, TYPE_NONE, TYPE_NONE}}, E_I16LS only has TYPE_REG and TYPE_IMM.

prj commented 3 years ago

It seems to me that all the se_ ram loads and stores have the type for the last field wrong as well.

Example: { "se_lbz" , 0x8000, 0x8FFF, 3, {{0x0F00, 8, 0, 0, 2, TYPE_MEM}, {0x00F0, 4, 0, 0, 0, TYPE_REG}, {0x000F, 0, 0, 0, 1, TYPE_MEM}, {0}, {0}}}, The format of the instruction is: 1000 SD4, RZ, RX or se_lbz rZ,SD4(rX) mnemonic. So the first value should be TYPE_REG in the list.

The index based re-ordering of registers seems also very confusing. I can see why this was done - so that e.g. fields[0] always contains the register that was clobbered by the instruction. But this is not transparent, so you have to be constantly looking at the source code, if there is not some funky reordering going on...

Perhaps my use case is different, and I am not trying to print out text, but rather access things programmatically.

prj commented 3 years ago

Correct definition for "se_lbz" would be: { "se_lbz" , 0x8000, 0x8FFF, 3, {{0x0F00, 8, 0, 0, 1, TYPE_MEM}, {0x00F0, 4, 0, 0, 0, TYPE_REG}, {0x000F, 0, 0, 0, 2, TYPE_REG}, {0}, {0}}}, This includes some reordering to not break the snprint magic. I'm going to go through a bunch more stuff and submit a pull request with these fixed ...

This one is actually pretty bad because all the se_xxx load instructions with RX higher than 7 get disassembled wrong!

prj commented 3 years ago

Created https://github.com/wargio/libvle/pull/14 to fix issues.

wargio commented 3 years ago

Oh nice catch. i'll double check the VLE ISA.

prj commented 3 years ago

I am looking at the library a bit more, and maybe I don't understand your logic behind vle_type, but... if (instr->fields[i].type == TYPE_MEM) { add += snprintf(str + add, bufsize - add, " 0x%x(r%d)", instr->fields[i + 1].value, instr->fields[i].value); i++; } Why if it's TYPE_MEM it's accessing the next entry, and not the current one? What is the point in types when what is stored in the type is not what is actually stored in the value?

For example, processing e_sth: { "e_sth" , 0x5C000000, 0x5C000000 | E_MASK_D , E_D , {TYPE_REG, TYPE_MEM, TYPE_REG, TYPE_NONE, TYPE_NONE}}, field[0] is target register as expected field[1] is base register, but it has TYPE_MEM??? field[2] is memory offset, but it has TYPE_REG???

Is there some certain design decision behind this? I mean, if you only intended this to print ascii, none of this matters. But using it programmatically... I am very confused.

I think my PR might actually break the mem printing.

Should the mem printing not be like this: if (instr->fields[i].type == TYPE_MEM) { add += snprintf(str + add, bufsize - add, " 0x%x(r%d)", instr->fields[i].value, instr->fields[i+1].value); i++; } ?

prj commented 3 years ago

Whatever the design decision was behind this, it completely broke the se_xx parsing because the library does checks on field types in find_se. But the field types do not correspond what is stored in the fields.

I probably can't go through everything, but for my own use I absolutely have to fix it so that the type is actually the type it says, or I will go crazy :)

wargio commented 3 years ago

Oh the weird thing there is linked to how the various parts of the opcode are parsed and located in the bytes. if you check the ISA, you will notice how and were the bits are located.

wargio commented 3 years ago

Whatever the design decision was behind this, it completely broke the se_xx parsing because the library does checks on field types in find_se. But the field types do not correspond what is stored in the fields.

I probably can't go through everything, but for my own use I absolutely have to fix it so that the type is actually the type it says, or I will go crazy :)

i'm sorry about that, but parsing bits in a common structure can be a mess :(

prj commented 3 years ago

This is not the problem here at all. I think TYPE_MEM is a pretty giant hack, since it actually should probably be TYPE_IMM ? But it can stay TYPE_MEM for the printing. Just put it in the right slot, and re-do the printing subroutine to check for every TYPE_REG - if there is at least one more field and it's TYPE_MEM, then do the special TYPE_MEM printing.

That's really all there is to it, after that you have everything in place.

All your parsing and the bit stuff is good. It seems to me that you did the TYPE's as a hack for the printing and then forgot that it was a hack when you implemented the se_ stuff. So while you did the register extension correctly, you depended on a check, which was based on a printing hack...

I am using your library in something that generates metadata for one binary from another binary, and most of the work is done by matching references. I am spotting the bugs because I get almost no reference matches... I will solve this and submit a new pull request. I am not sure if I will have time to fix the tests though.

wargio commented 3 years ago

yes the mem thing is hacky i agree. i can fix the tests if you want.

prj commented 3 years ago

I fixed all the TYPE_MEM stuff apart from this: // VLE Instructions for Improving Interrupt Handler Efficiency (e200z760RM.pdf)

Is TYPE_MEM and TYPE_REG flipped in those too? My processor does not use those.

wargio commented 3 years ago

Shouldn't the third type in the list for e_stb be TYPE_REG?

Ok, no it is not a register. RS is at position 0 and is a REG, RA is at position 1 and is a REG containing a pointer, so is type MEM and D is at position 2 and is type IMM.

image

wargio commented 3 years ago

E_I16LS only has TYPE_REG and TYPE_IMM.

You are right, but the definition of those bits are splitted in 2 bit areas.

image

wargio commented 3 years ago

or se_lbz rZ,SD4(rX) mnemonic.

Ok, in this one you are right that is wrong, but also wrong that is REG, it should have been IMM REG MEM

wargio commented 3 years ago

ok so from what i've seen in this bug report, you are splitting each instruction as is written in assembly and not as is written in the machine lang, which differs from what you are seeing. i guess it could be a good idea to refactor the code to have that view that you were expecting.

prj commented 3 years ago

Shouldn't the third type in the list for e_stb be TYPE_REG?

Ok, no it is not a register. RS is at position 0 and is a REG, RA is at position 1 and is a REG containing a pointer, so is type MEM and D is at position 2 and is type IMM.

image

In case of e_stb that means then that according this logic the e_stb is right, but everything else is wrong... e_stbu, e_sth, e_sthu e_stmw, e_stw, estwu This is indeed the case. All the stores and loads are not defined according to the VLE PEM. Stuff is shuffled around. The MEM is always in last place. Unless you're calling a register "MEM". I don't, I call the immediate value "MEM", as that makes much more sense and doesn't break your se processing.

or se_lbz rZ,SD4(rX) mnemonic.

Ok, in this one you are right that is wrong, but also wrong that is REG, it should have been IMM REG MEM

I would say it should be IMM REG REG by your logic. But because you are using MEM for the printing hack, then you would do MEM REG REG and re-order using the re-ordering to REG REG MEM to be in-line with the 32 bit stores/loads. So it prints properly.

ok so from what i've seen in this bug report, you are splitting each instruction as is written in assembly and not as is written in the machine lang, which differs from what you are seeing. i guess it could be a good idea to refactor the code to have that view that you were expecting.

I do not think this is the case. I'll fix some stuff and submit a PR, you will see what I mean. I am trying to process every instruction (and have it in the structure) the same as is defined in the VLE PEM.

E_I16LS only has TYPE_REG and TYPE_IMM.

You are right, but the definition of those bits are splitted in 2 bit areas.

image

Yes, but your processing does not touch the 3rd area.

    case E_I16LS:
    {
        v->n = 2;
        v->fields[0].value = (data & 0x3E00000) >> 21;
        v->fields[0].type = p->types[0];
        v->fields[1].value = (data & 0x1F0000) >> 5;
        v->fields[1].value |= (data & 0x7FF);
        v->fields[1].type = p->types[1];
    }
    break;

So it makes absolutely no sense to have an extra field there, it's only confusing. The fact that it's not packed all in one line is quite irrelevant in this case IMO.

prj commented 3 years ago

Here is an example how I am using your library, maybe it is easier to understand what my problem is? image

wargio commented 3 years ago

Ok, sounds perfect. probably it is better to have one unique format and follow that. Then i guess, if you agree to use the assembly format.

prj commented 3 years ago

I will fix stuff up the way I mean, and submit a PR... Then you can look if it is how you like.

wargio commented 3 years ago

yes ofc

prj commented 3 years ago

I fixed everything, including the tests which pass. PR created: https://github.com/wargio/libvle/pull/15

Thank you for your work.

wargio commented 3 years ago

no, thank you for your PR.

wargio commented 3 years ago

Merged.