omarandlorraine / strop

Stochastically generates machine code
MIT License
97 stars 1 forks source link

The quality of the codebase is quite poor #31

Closed omarandlorraine closed 2 years ago

omarandlorraine commented 2 years ago

Failings:

Needs:

omarandlorraine commented 2 years ago

Instead of using that massive enum for the operation, and then having to specialize some of its behavior from machine to machine, we now have an approach I feel is saner. A struct has the data needed to execute the instruction, plus a pointer to a handler. One handler "executes the instruction" and updates a State appropriately, another handler disassembles it, another randomly mutates it. It's a Copy type, so the basic blocks can be mutated at will.

There are a lot of massive changes here because of this. Hopefully this works out.

omarandlorraine commented 2 years ago

I've removed some machines from this branch of strop. They weren't supported anyway, they didn't even work properly and were getting in the way of the work on this branch. These removed machines are PICs, early 8-bit Motorolas and the KR580VM1. I'll want to add most of these back later though.

LoganDark commented 2 years ago

Took a 10-second look at the current state of the branch; machine.rs still includes what looks like STM8-specific assumptions (R enum). Is this intentional? Maybe I'd want to add out-of-tree support for a fictional instruction set with 256 general-purpose registers (of which I know one).

omarandlorraine commented 2 years ago

what looks like STM8-specific assumptions (R enum). Is this intentional?

The R enum (and corresponding fields in the State struct not only contain stm8 specific stuff, but also stuff specific to other architectures. It's a part of the design I'm not keen on, and you're right that it feels uneasy.

Can you think of a better way? I am sure there's a better way. But I haven't hit on it. (after all, I am still quite a Rust newbie).

Maybe it would be better to have an Stm8State struct, a Mos6502State struct, and similarly an Stm8Instruction struct, a Mos6502Instruction, and associate these somehow. Then architecture specifics will be better contained. But I don't know.

LoganDark commented 2 years ago

Ok, so I went ahead and cloned this locally. Here are some observations:

I haven't really dug into the code yet. It doesn't even compile. ;(

LoganDark commented 2 years ago

Also regname always returns "a" regardless of the input because of a pattern that always matches

omarandlorraine commented 2 years ago

Each file is extremely large. There is too much to take in. So many structs and traits and functions per module.

Some of this (especially in machine.rs) is about to go. Others, for example in machine/stm8.rs, I think it's excusable if it's, okay, STM8 support goes in this file; there's a lot to do there so that's why the file is enormous. Do you think that's reasonable or would it be better to chop it up even further?

The library makes assumptions based on the machines it thinks it'll support. It only supports a limited number of registers and flags.

As to your previous comment, I know this is not exactly ... artful, so I want to ask what you would suggest.

There are large commented-out sections of code, as if you forgot you're using Git.

Yeah. I'll be deleting at least some of those next.

You don't seem to have any particular code style.

Do you think there's a better way to format the code? I have never messed around with a rustfmt config before, nor do I have a strong opinion about this. Could you maybe branch off master and put something in?

There is also a github action to make sure the code is properly formatted. Locally, I have precommit hook to do this as well.

unless stm8 allows divisions by zero?

STM8 does blithely compute divisions by zero; this does not trap on the STM8. (doesn't do anything useful either)

There are a bunch of functions littered around that do things but they're disorganized and I don't know why they're there. If I Find Usages I see they're used in the Instruction statics, but there are definitely better ways to do this!

(at least) some of these are handles. The Instruction has these function pointers. One executes the instruction on some State, one makes random mutation to the instruction, another writes the disassembly.

How would you suggest we organize it?

It doesn't even compile. ;(

True. This is an experimental branch that really smashes things up. I've removed some architectures as well. But if I get it to compile then my expectation is that the compiler warnings is going to

LoganDark commented 2 years ago

But if I get it to compile then my expectation is that the compiler warnings is going to

...guide you through the rest? :P

LoganDark commented 2 years ago

As to your previous comment, I know this is not exactly ... artful, so I want to ask what you would suggest.

You suggested having each architecture have its own machine state, instructions and etc. - that would be the best solution imho. The common framework would then be the mutation / evolving behavior, some way to put tests through the machine.

Do you think there's a better way to format the code? I have never messed around with a rustfmt config before, nor do I have a strong opinion about this. Could you maybe branch off master and put something in?

There is also a github action to make sure the code is properly formatted. Locally, I have precommit hook to do this as well.

I typically just use IntelliJ's built-in formatter, but I guess that doesn't exist if you don't have access to it (and are using VSCode instead). VSCode doesn't have a formatter so you're out of luck there. Rustfmt can be configured to get 95% of the way there, but it intentionally bans things like single-line where clauses. Not only are they not supported, the idea is not even entertained (i.e. opening an issue about it will get it denied). With that said, it could be an option.

Reformatting would have to happen after this PR; stuff like changing to tabs instead of spaces. I'm fairly sure that you use tabs but just didn't bother to change the Rust defaults; your JSON files are tabs. Please let me know if this assumption is incorrect.

How would you suggest we organize it?

gbemulator does a similar thing using closures so that an instruction's behavior can be defined at construction. So you would have like Instruction<Stm8State> for an STM8 instruction, which would have closures that take &mut Stm8State.

This is an extremely nice solution imho.

LoganDark commented 2 years ago

Also wrt tests: There's a way to have don't care bits, floating point comparisons and etc.

The strop API to get a certain function should be that you provide a closure that performs tests to score how "close" a given function is to what you want.

For the current JSON system, strop would just feed its own closure that does the JSON's bidding (i.e. runs those tests), but then strop can also be used as a library by other Rust code with more complex requirements (like floating-point).

This is important if that other Rust code also wants to define its own architecture, with something like floating-point registers.

omarandlorraine commented 2 years ago

Okay, we've lost 6502 support and there's a bajillion warnings, but at least it compiles again now.

But the tests do not compile.

omarandlorraine commented 2 years ago

Also regname always returns "a" regardless of the input because of a pattern that always matches

Not so; see the const A: Datum = Datum::Register(R::A); at the top of the file.

I typically just use IntelliJ's built-in formatter, but I guess that doesn't exist if you don't have access to it (and are using VSCode instead). VSCode doesn't have a formatter so you're out of luck there. Rustfmt can be configured to get 95% of the way there, but it intentionally bans things like single-line where clauses. Not only are they not supported, the idea is not even entertained (i.e. opening an issue about it will get it denied). With that said, it could be an option.

I'm actually on Vim; I am not using an IDE at all. I prefer all command line. Another contributor could come along in the future and use another IDE, so we should not use an IDE specific solution imho.

Reformatting would have to happen after this PR; stuff like changing to tabs instead of spaces. I'm fairly sure that you use tabs but just didn't bother to change the Rust defaults; your JSON files are tabs. Please let me know if this assumption is incorrect.

That's correct, I use tabs. It's habit; I don't have an actually strong opinion. And rustfmt has reasonable defaults I've found.

gbemulator does a similar thing using closures so that an instruction's behavior can be defined at construction.

That's neat. A couple of concerns about using precisely this approach:

LoganDark commented 2 years ago

Not so; see the const A: Datum = Datum::Register(R::A); at the top of the file.

This doesn't matter because A is an identifier and therefore always matches everything when not qualified.

Also that's not how match works. It matches structure, not identity. You want a if a == A then. I'd just use an if-else chain.

I'm actually on Vim; I am not using an IDE at all. I prefer all command line. Another contributor could come along in the future and use another IDE, so we should not use an IDE specific solution imho.

That's rustfmt like I said. It's a command-line tool, cargo fmt is all you need and it automatically picks up the config and everything.

That's correct, I use tabs. It's habit; I don't have an actually strong opinion. And rustfmt has reasonable defaults I've found.

The stance on tabs vs spaces is that they picked one at random because people will complain either way. Therefore I typically choose to manually enable tabs for correctness' sake.

the Instruction needs to be a Copy type, since unlike gbemulator, we're copying the entire struct around when we mutate the program, because we don't have a binary representation of the instruction stream.

Pure closures are typically Copy. For that matter they're also Send + Sync.

An stm8 "add with carry" instruction has precisely the same semantics as a 6502 "add with carry" instruction. Same with Z80 etcetera etcetera. So I see the current strategy of having a function to do this is more amenable to code reuse.

Yes but those are completely different architectures with completely different states. Or at least they should be if the library is going to be extensible any time soon. :)

I am not aware of any way to examine a closure at runtime to modify its arguments, or "disassemble" them

Alright so the closure thing I said was incomplete; the closure would also have to take the instruction operands, which would probably have to be allocated in a Vec or something. As for the disassembly that can stay another closure. And for mutations obviously that can be a closure too. Just closures all around; it's possible to deduplicate common functionality of course, but you really do want to aim for extensibility.

LoganDark commented 2 years ago

By the way; that fictional machine I mentioned earlier requires self-modifying code in order to do a lot of trivial operations (like jumping to an address stored in a register). Will strop ever support that? :P

omarandlorraine commented 2 years ago

Also that's not how match works

Seems okay in the master branch. Eh, maybe it's being done differently there. In any case, this matter should be unit tested. BUT I'm thinking, wouldn't it be better if it were something that implements std::fmt::Display.

I typically choose to manually enable tabs for correctness' sake.

Please can you do a pull request to get this in?

Alright so the closure thing I said was incomplete; the closure would also have to take the instruction operands, which would probably have to be allocated in a Vec or something.

I don't think Vec is a Copy type. But what do you think about using an enum here? The variant could then have, what-do-you-call-ems for arguments. Like Stm8Instruction::Load(source, destination) or Stm8Instruction::SetCarry or Stm8Instruction::Negate(location_in_memory)

Will strop ever support [self-modifying code]?

I think a lot of people are afraid of self-modifying code. But my opinion is, it's not so bad if it's structured and well-defined precisely how the code can be modified and under what circumstances, and preferably, why. Real machines have been designed that depend on this technique for accessing arrays and what have you (for example, the LGP-30, I believe also the TX-0). I have done this before in 6502 assembly for speed of execution.

Originally I said to myself, strop will not support things like self-modifying code. But I'm hopeful that the coming revisions can allow it. But it will be crucial to think about specific kinds of modification (for example, can we add a constant to an instruction's operand? Does it make sense to flip bit 6 of the opcode? and so on.)

LoganDark commented 2 years ago

I don't think Vec is a Copy type.

Could use arrayvec with something like a length of 4, which is more than enough for most (all?) instructions.

I did a little API design experiment on my own and ended up with the actual behavior of an operation, like disassembly, mutation, execution, being defined in an Operation struct, and the actual operants of an instruction being in the Instruction struct. Something like:

#[derive(Clone)]
pub struct Operation<State, Operand, OUD, IUD> {
    pub disassemble: fn(&Instruction<State, Operand, OUD, IUD>) -> Cow<'static, str>,
    pub mutate: fn(&mut Instruction<State, Operand, OUD, IUD>) -> (),
    pub execute: fn(&Instruction<State, Operand, OUD, IUD>, &mut State) -> u64,
    pub userdata: OUD
}

#[derive(Clone)]
pub struct Instruction<'op, State, Operand, OUD, IUD> {
    pub operation: &'op Operation<State, Operand, OUD, IUD>,
    pub operands: SmallVec<[Operand; 4]>,
    pub userdata: IUD
}

I'm not sure if strop can be retrofitted to do this, but it's probably a good way forward.

You can construct Operations in statics like so:

// machine specific state
pub struct State { ... }

// machine specific instruction operand
pub enum Operand { ... }

pub type Operation = crate::instruction::Operation<State, Operand, (), ()>;
pub type Instruction = crate::instruction::Instruction<'static, State, Operand, (), ()>;

static NOP: Operation = Operation {
    disassemble: |_| Cow::Borrowed("nop"),
    mutate: |_| {},
    execute: |_, _| 1,
    userdata: ()
};

static LOAD_MEM: Operation = Operation {
    disassemble: |insn| {
        if let [source @ Operand::Address(_), target @ Operand::Register(_)] = insn.operands.as_slice() {
            Cow::Owned(format!("load {} {}", source.to_string(), target.to_string()))
        } else {
            unreachable!()
        }
    },
    mutate: |_| {},
    execute: |insn, state| {
        if let [Operand::Address(source), Operand::Register(target)] = *insn.operands.as_slice() {
            state.registers[target as usize] = state.memory[source as usize];
            1
        } else {
            unreachable!()
        }
    },
    userdata: ()
};

Overall it feels like it'd scale extremely well.

omarandlorraine commented 2 years ago

I'm not sure if strop can be retrofitted to do this

This is good stuff; exactly what I need to do. Therefore, I will make it fit.

There are at least a few other places that now need type parameters. For example, the pub struct BasicBlock in search.rs. Probably this will bubble all the way up to the stochastic_search and optimize and quick_dce functions.

omarandlorraine commented 2 years ago

It looks as though I'm going to run into this issue. Because I'm using iterators to generate swathes of mutated basic blocks. But Generic Associated Types are not stable rust yet.

omarandlorraine commented 2 years ago

I'm not sure if strop can be retrofitted to do this, but it's probably a good way forward.

You were right to wonder about this. I found it really hard to do. But there is another pull request (#33) now. It has separate types for each architecture's state and instruction. And it compiles, which this branch doesn't.