sparkiedk / Toyota-PCM-hacking

36 stars 16 forks source link

Hacking a USDM 4AGZE #4

Closed rhit-mahnee closed 1 year ago

rhit-mahnee commented 3 years ago

Good morning Sparkiedk! I'm looking into hacking a USDM 4AGZE ECU from an '88 MR2. I'm not sure how familiar you are with those, but from what I understand they tend to run lean at low RPM under increased boost due to a cap applied to the AFM reading in the code. My hope is to pull the code off the ECU, modify that part of the code, and flash a new chip. If I understand correctly, you've pulled code off of other Toyota ECUs before, as have a couple other guys in the MR2 community over a decade ago. Reading through the forums you all have posted in I think I understand generally how the code was extracted but I don't actually know any of the details necessary to do it myself. Would you happen to have any resources on hand to explain what I would need to do to extract the code and decompile it?

sparkiedk commented 3 years ago

I havn't got up to this stuff for a while, but I've got some stuff around that can help describe the process.

Firstly we will need to know what chip it is, a toshiba 8x whatever or an hitachi hd6301, can you pop the lid off the PCM and post a picture?

moving on from that you'll need some tools: usb to uart (NOT RS232, uart! 5V!) breadboard, wire, flash IC, maybe some glue logic (an inverter? some gates? I don't recall anymore)

and then (ill look it up later) there's some code in the repo to just dump the data inside the chip out the UART - you capture it with your favourite terminal program (i prefer realterm) and can start hacking.

you will NOT be able to "reflash" these chips, they use mask rom (the hardest of hardcoding). What I've done is recreated the ports that are lost when running the chips in external mode using logic IC's or cplds. then they can run from your external flash memory. all of that goes on a daughter card that plugs into where the CPU used to go. I have designs for both pcms that I will publish - i used to keep them private but I'm never going to start selling them so I might as well open source them.

rhit-mahnee commented 3 years ago

The chip is a 64 pin D151801-5890. At this point I'm pretty confused about which ones are Toshiba and which are Hitachi. Pictures are below.

That does help a lot, thank you. I don't entirely understand it all, but my CPE friend helping seems to have a good idea of what you did. I saw that under Toshiba 8x info you have the designs for a reader PCB. That's for pulling the code off of the chip, right? If you could point me towards what code is for what, I'd greatly appreciate it.

I am aware that you can't reflash these chips. I was thinking of getting a new chip and flashing that, until I learned and started to understand the external flash memory route. If you already have a daughter board designed, that would be absolutely wonderful.

I very much appreciate you sharing your work. There aren't a whole lot of open source resources out there and it's just a shame when someone puts in the work to pull of something like this but then disappears along with all of their knowledge. I'll gladly share whatever I can to contribute to what you've made available.

0421212013a 0421212013

sparkiedk commented 3 years ago

Yep that there is a toshiba 8x. The reader pcb/schematic isn't mine, it's from H. Kashima and it's almost 20 years old ;). I personally didnt build a PCB, I just breadboarded it. You can replace the whole serial port and ADM202 with a simple usb--uart.

The good news is that I have the exact same CPU in my 1UZ powered supra and it's running in external mode with the board of my own design. I will get the design files and code up, as well as an uncrippled reader (HK's was crippled, deliberately, and I fixed it, but for the life of me I cant remember how) Also the reader schematic cites a 27C256 eprom and nuts to that, this is the future and we dont need UV lights to erase things, I used a 39SF040 flash chip which is pretty much drop in, I think - you're gonna have to read a datasheet.

To program the flash you'll need an eprom/flash writer, and all this gear needs to be 5V compatible; something like the TL866II is gonna get the job done, or you can go discount+hardcore and bitbang the data in with an arduino or your other favourite embedded system.

anyways I'll start sanitizing files and dropping them in logical places, most of the useful stuff is in a directory called "1uz private work" so I'll either sort and post or get bored with that and dump it all there. keep an eye out.

sparkiedk commented 3 years ago

note: toyota 1uz pcm/reader.asm/reader.bin will run on your chip and offload the mask rom through the serial port.

As I recall the baud rate is goofy and instead of understanding the serial port and messing with it I simply used an oscilloscope and then set my ftdi chip for the closest matching rate.

rhit-mahnee commented 3 years ago

Thank you very much! Looks like I need to find software to read some of these files, then start getting components. I'm sure I'll have more questions once I actually get to work on things. And yes, I am hoping to set up a laptop readout through USB somewhere down the line.

sparkiedk commented 3 years ago

so asm files and idb files for the t8x can be processed with hex rays IDA version 4.x - theres a plugin under the toshiba 8x info/ida plugin directory that's only compatible with version 4 (i confess i havnt tried fixing it for the current freeware release 7.x, i just know it fails) a copy of ida free can be found below: https://downloads.tuxfamily.org/hokuto/%2B%20dev/idafree49.exe which is the same as my copy. the assembler to turn .asm files back into binaries is located under toshiba 8x info/tasm and is called with the various batch files - either in that directory parametrically, or what I did while working intensively was to create custom batch files that I can just double click and get a binary from - makes life easier. I wonder if vscode could be easily set up to work with these files...

pcb files are expresspcb - it was the tool i was fluent in at the time and I regret using it. Get their software for free - the "classic" pcb and sch software was easier for me to manage than their "plus" stuff but try both, ymmv). https://www.expresspcb.com/pcb-cad-software/ now a days I use diptrace which has a free version, converting the schematics and pcbs over to diptrace has the advantage of making them freely exportable to other common formats like kicad, altium, etc. I could get behind a project like that.

rhit-mahnee commented 3 years ago

Got it. I will start looking into those. Thanks!

sparkiedk commented 3 years ago

I have posted photos of my modified pcm and daughtercard in "Toshiba 8x daughtercard". The hardware bug is also identified: IS/OS were unimplemented and IS needed a bodge wire as the 1UZ pcm uses that pin to sense the /IDL signal. A remake of this board should plumb those signals out to the CPLD.

and while I'm on the topic of the CPLD: I chose the atmel 1504 as it had a spectacularly cheap development kit (i think for the smaller one) and it's 5V compatible. wincupl doesn't even seem to work under windows 7, I was using XP back in the day to compile and flash - heck the jtag adapter i was using is a parallel port device, i doubt youll get respectable performance out of a usb to parallel chip so the whole thing might need a rethink for 2021. it's certainly possible to implement the CPLD functions with discrete logic as well, it will just be bigger.

rhit-mahnee commented 3 years ago

Nice! Looks good! I was looking at the 1UZ daughtercard designs, and noticed you have three different versions. What are the changes between the original and V2? And what are the extras on the third board? The upper added chip looked like it was related to the ignition.

I'll look into doing some updating. To be honest, most of this computing stuff is pretty unfamiliar to me, but I'll run it by my CPE/EE friends and I'm sure there are some professors here at school that would love to give pointers.

sparkiedk commented 3 years ago

I checked the file dates and tried comparing the boards, seems like V2 was cloned out with the express purpose of making edits and upgrades, and never got worked on. I've deleted a number of the confusing files - they added nothing. The nature of expresspcb's ordering system is the cheapest possible boards you get are a specific size and you must order three ("miniboard plus classic"). Oshpark is way better and lets you use real pcb design tools (expresspcb locked you into their sw ecosystem) why does this story matter? well it means that it was cheaper to make the boards much larger than they needed to be, and why waste that space? the extra stuff added in the "+ extras" is 1) a flash chip adapter to get a tiny SMT flash memory into my large eeprom burner and 2) an ingnition multiplexer based on a dspic30f3010 that was destined for an AE86 4age - it would intercept the PCM timing outputs and the motors NE and G sensor signals and multiplex the factory ignition signal to aftermarket coil-on-plug units. Worked well in testing and then my brother bought a blacktop and an aftermarket fuel management system for it.

that being said if you go and implement a daughtercard there's nothing stopping you from adding MORE GPIO than the original processor had and using those outputs and a bit of logic to steer ignition pulses out to your own coil on plugs without having to bodge another CPU into the harness like I was doing. it's one of the reasons I left a number of BGA pads connected to the unused outputs of the CPLD - just in case I wanted to make things better.

rhit-mahnee commented 3 years ago

Ah, that makes sense.

Interesting. I will keep that in mind if I ever decide to switch to CoP. My current goal is just to fix the AFR, but I very much like the extra flexibility while keeping the original Toyota ECU and tunes.

rhit-mahnee commented 3 years ago

So I think I've just about found all of the components and am about ready to order them. I just have a couple of clarification questions first:

  1. It looked like all of the smaller components on the daughter board are on the back of the PCB, is that right? Would you be able to get a picture of the back of the board for me?
  2. The reader board schematic uses a 12MHz crystal with two 10pF capacitors. The daughter board schematic shows an unspecified crystal and two capacitors, though the .pcb file refers to a 4MHz crystal. Is that correct? And if so, are those two capacitors the same 10pF?
  3. The daughter board schematic specifies an NC7SZ04 inverter. Any idea if it's the M5X or P5X package?
sparkiedk commented 3 years ago

1) decoupling caps and pull up resistors and the USB to bus adapter are the only things on the back, and the pins that drop down to the board. Cant grab a pic right now but ill include a snapshot from the layout software: image

2) I just stole the caps right off the PCM, I figure my pcb wont be much different from theirs and the caps and crystal and microcontroller must all be pretty well matched already. if you're breadboarding it I'd recommend going with a slower crystal like 4MHz or less, and 10pf is a great place to start - really for the reader as long as there's some kind of clock you can squirt the data out.

3) the inverter was added after the only prototypes were made and isnt yet layed out on the board, so you're free to choose the package that suits you best. I didnt end up needing it as I switched out to a ATF1504ASL CPLD which according to atmel: "Atmel Low- or Zero-power PLDs include an input transition detection (ITD) feature, indicated by the “L” or “Z” in the part number suffix. These PLDs save power by automatically powering down to a “standby” or “sleep” mode when no signal transitions occur on the inputs or internal feedback nodes of the device" http://ww1.microchip.com/downloads/en/Appnotes/DOC0457.PDF otherwise the inverter is needed to get the CPLD a powerdown input so it doesnt waste 10's of mA while the car is off.

anyways I'd advise against ordering everything in one shot, I'd just grab the pin through flash, flash programmer, whatever glue logic and a good usb to uart adapter and get the code off. it is going to take some time with the original PCM and the code to determine which inputs correlate to what memory registers in the code and I feel like the daughtercard needs a solid review before anyone makes more of them. Also the vendor sucks and I'd rather use oshpark.

rhit-mahnee commented 3 years ago

My hope is to get a PCB made before the end of the month, that way I can it can be done here at school. Your comment reminds me of another question though. What exactly is the microcontroller doing? If I understand correctly, it's simulating the ports lost due to running the CPU in external mode. How is it doing that?

sparkiedk commented 3 years ago

I hope you're referring to the ATF1504ASL when you say "microcontroller", because technically it's not one and the D1518XX we're pulling from the PCM is one, but I digress.

The ATF1504ASL is a complex programmable logic device (CPLD) which takes a description of a logic function to implement, in this case that description is written in atmel's proprietary language CUPL. If you have a look at the .PLD file you'll see it's some declarations of signals (most are routed to pins, some are simply internal) and then equations relating those signals together. this isn't to say it's purely combinatorial however: there are latches (flip flops) available to the designer as well.

I'm using the CPLD to do the following: -emulate porta, one of the lost ports -emulate portb, another lost port (partially, there are latching features unused by the 1uz PCM that could be used elsewhere -emulate the port b input strobe (IS) pin which is read from register PBCS (bit4) and is used to sense the 1uz IDL signal -perform address latching for the RAM and flash memory (the address bus is multiplexed so half the bits must be latched to decode the full address) -perform address decoding for the Flash, ram, FTDI bus to uart -emulate a 2 bit GPIO port I call "port_hi" which allows the microcontroller to read the status bits of the FTDI chip

there's some shenanigans going on in the code as well where I group common bits of the address decoding into a single signal like "portaccess" - if this signal is true then all the conditions are true to access a GPIO port, but I dont yet know which one I'll be accessing, hence the big equation on line 143: databus = portaccess&(port_a.io&add5&!add1&!add0 (..OR blah blah OR blah blah i cut it down cause its long and ugly) above portaccess is used as a master gatekeeper, and then only address bits 5, not-1 and 0 are used to AND the bits read from the porta pins onto the data bus. all the rest of the parts of the equations (lines 144 to 146) do the same but with different address targets.

rhit-mahnee commented 3 years ago

Oops, thought I responded to this but apparently not. Yes, I did mean the CPLD, just getting a little vague in my language since this isn't exactly my usual field. Sounds like I need to take a closer look at the CPLD code as well as probably the ECU code.

Things have been slow because of other projects popping up, but I'm finally about to pull the code off the ECU. Once I get the code dumped, do you have a program to turn it into something readable?

sparkiedk commented 3 years ago

the binary you pull from the chip can be processed with hex rays IDA version 4.x - theres a plugin under the toshiba 8x info/ida plugin directory that's only compatible with version 4 (i confess i havnt tried fixing it for the current freeware release 7.x, i just know it fails) a copy of ida free can be found below: https://downloads.tuxfamily.org/hokuto/%2B%20dev/idafree49.exe

when you import it youll have to do some tweaks like setting the binary offset in the memory map and then (probably? been a while) go to the interrupt vectors at the end and find the entry points and manually tell ida to disassemble them.

after that there will be a mix of code and data, and you will have to find un-disassembled code and manually flag it as code for IDA. this is usually the case with "jump tables" as I call them, basically the assembler version of select() case:; fist data is loaded, then condition, then it is added to an offset, then the result is used to pull 16b from a table close by, and whatever the value pulled was gets jumped to - your job will be to findd the table and then flag each of the pointed clode blocks for disassembly. it's not hard when you get used to it, and once youve got a binary up I can probably get it 80% disassembled for you in an hour or so.

rhit-mahnee commented 3 years ago

Alright, that sounds manageable enough. Once I get the binary pulled (and the disassembled code when that's done), you want a copy of that to throw on the repository?

sparkiedk commented 3 years ago

for sure. post it to your repo and PR it into mine? I dont do github a lot for collaborative development but im sure we can figure this out.

rhit-mahnee commented 3 years ago

Currently trying to pull code. Anything we need to know? We're getting some sort of signal out, but nothing we can read.

rhit-mahnee commented 3 years ago

We're finally getting stuff out of it. Way too much though.

sparkiedk commented 3 years ago

so if you're using reader.bin it will simply dump the binary contents of the internal memory to the serial port. my workflow back in the day was -run with an oscilloscope attached to get the data rate from the smallest high or low time -start up a terminal (i used realterm) and set it to that data rate, open the port -hold the board in reset -start a binary data capture on the terminal -release reset -wait until the indicated characters per second goes back to zero, or alternatively if the scope is still attached wait till the data stops (you dont see the data stream when realterm is capturing) -stop capture

because there's no encoding or data error catching/handling I would usually do this process three times and use fc -b file1.bin file2.bin to check if the files were identical, if not there was an error. repeat this process until you can match files without an error (i found my error rate to be very low, like 1 of every 10 captures)

as for the data: it should look like gibberish, being binary code for the processor with no text strings, and there should be 16kb of data returned. if things dont really look right the base address in reader.asm might need to be changed (ie for 12kb, 8kb or smaller internal rom sizes, but i think we can get by with a 16k dump)

rhit-mahnee commented 3 years ago

That's about what we ended up doing, just using a Python script to collect the data. The most prominent issue we had at the end was that we were getting upwards of 30 kb before cancelling it. Probably haven't figured out the baud rate correctly.

sparkiedk commented 3 years ago

that's a good result still! does the data stream eventually stop? the program terminates in an infinite loop and since you're working with python you have the option of sending some data first to validate your connection, and perhaps after even - the world is your oyster. When I wrote the reader code I tailored it to my remaining desktop programming skills (ie none) and the tool chain I had available (kashi's reader outputs an S record but i dont think IDA will read one and i couldnt be arsed to convert it)

maybe change out the code for something simple that dumps a known string ("Hello world!" if you're into that sort of thing) and keep sending it with a long pause in between (100ms ish) and then just keep changing the baud rate until you get the text. This is assuming you dont have a scope handy. Also I think pin 13 should be a serial clock, which is always running once the serial port is up - a frequency counter on that pin should give you the baud rate.

rhit-mahnee commented 3 years ago

The data stream was not stopping. At least not within a reasonable amount of time. Even after we shut off the chip the computer would keep writing for a little while. That's an idea. I'll have to see what we can work out with that. I currently do have a scope I can use, but I won't for much longer, at least for a while.

I rewired the breadboard today to try to get rid of some of the noise. Still trying to figure out if it's actually behaving properly or not. I do like the idea of a test message though to confirm that everything is indeed functioning as it should.

sparkiedk commented 3 years ago

see if its not stopping that sounds a lot more like the chip is running its own code and trying to chat to other parts it expects to find in the PCM. One of my first steps (before reading out the code) was the write out an infinite loop and have the computer execute that. then I probed the bus to make sure that the chip was executing from external memory and trapped in the infinite loop (there's only like 3 bus cycles in this case, so decoding everything by hand on a scope wasn't too laborious.) If you cant guarantee it's running your code correctly then you're gonna be in trouble, I ran up against this numerous times. another aspect you should look out for is that if a single bit is wired incorrectly parts of your program may still function fine, however other parts are going to crash hard.

rhit-mahnee commented 3 years ago

Now that you suggest that, it does seem entirely possible that it's running its own code. It does look like it's doing something, I just can't tell what and it doesn't look like a data dump.

rhit-mahnee commented 3 years ago

So far I'm struggling to get the computer to pick up anything meaningful (i.e. anything but straight 1s). There's also a weird voltage shift that I think is causing me problems. Pretty sure it's not normal since it's showing up in the clock signal as well when the chip is running (with the reset held, the clock looks like a normal noisy square wave). Any suggestions? I also have a short clip of what the output signal looks like from the dumping code. It seems awfully repetitive and does not appear to end.

https://user-images.githubusercontent.com/45320969/126423573-47e21ff8-8f54-4a50-80ad-9018bcf8fc60.mp4

sparkiedk commented 3 years ago

That signal looks like the toyota code trying to poll the adc, which it does very repetitively. Did you probe the logic lines to the external memory? If its running you'll see lots of action on D0~7 and A0~7, running at the crystal frequency/6 as I recall. FWIW I'm uploading a bus timing diagram I made a while ago: https://github.com/sparkiedk/Toyota-PCM-hacking/blob/master/Toshiba%208x%20info/bus_timing.jpg

Make sure your I/E pin 12 is grounded, especially during boot. try to power the board first and then release reset, not just let it all come up together.

Also the plot on the scope looks terrible, why is there so much noise? check your grounds and your probes, are you using x10 mode? X1 would be fine for this stuff.

rhit-mahnee commented 3 years ago

Alright, makes sense. I'll do my best to check those lines, though I don't have a proper oscilloscope right now and it might be a while before I do again.

Pin 12 is indeed grounded. I've been over the board probably five times now, checking each and every connection. During my testing I've done it both ways, with different behaviors depending on the order of operations. I've generally held the reset during startup though.

That's one of the things I've been trying to figure out. I'm going to try condensing the board more to see if I can reduce any of that. Again, I also noticed a lot of extra noise and an odd voltage shift in the serial clock signal whenever the reset was released. As for X1 vs X10, not a clue. I haven't used scopes much until now.

sparkiedk commented 3 years ago

X1/X10 will be a little switch on the probe, X10 adds an extra 9 megaohm resistor in series to reduce the voltage at the scope and the parasitic loading on the circuit. good if you want to see your crystal oscillator in action, but noisier and not useful when you're looking at low frequency logic. https://www.electronics-notes.com/images/oscilloscope-probe-attenuation-switch-1358.jpg

If you're willing to post some pictures of your board I can offer some pointers.

rhit-mahnee commented 3 years ago

Ah, I see. Yeah I have no clue which I was using, though I'll keep that it mind next time I find a scope to borrow. IMG_20210722_002858__01 IMG_20210722_002916

sparkiedk commented 3 years ago

Nice adapter pcb! I have one that looks similar, but rougher since I used a clothes iron to make it. However I see some deficient solder joints on the 0.1" pin headers, there should be full fillets around each pin. It's possible that the annular rings about the pin could be broken and discontinuous, so a good solder joint will get around that issue and also provide more mechanical strength. https://blog.seeedstudio.com/wp-content/uploads/2019/08/Anatomy-of-a-good-solder-joint.png What I would recommend is giving that board another pass with the soldering iron, and then when you reinstall it in the breadboard use a multimeter to check all the important voltages (vcc, gnd and I/E for starters) at the chip against the power supply ground. I say at the chip because there are several points of failure, and even that chip socket is one of them - I had the darndest time getting 64 pin shrink dips actually, so these sockets could be older than god and somewhat less reliable.

--edit-- also be wary that when you plug in the usb - serial and the scope they share a common ground now, ie the 0V node of your circuit is attached to the ground pin in your house and that is continuous from the computer to the scope.

Also are the grounds on the right connected to the grounds on the left through the elegoo board? Does that actually work that way? I'd beep them all out.

rhit-mahnee commented 3 years ago

So I went over the board again, haven't noticed any major differences. But I've got some more pictures for you. First, the voltage drop on the clock signal when operating. These are the serial clock, first with reset held and second with reset released: IMG_20210723_180003 IMG_20210723_180257

Second, I have pictures of pins DQ7 and A7 on the external memory. There definitely seems to be a lot going through A7. Those lower left bits on DQ7 (first picture) flip from top to bottom. IMG_20210723_175529 IMG_20210723_175856

Another picture of the output signal. All but the last bit between those lower two bits flip between top and bottom. IMG_20210723_182009

And finally, I tried starting up the chip with I/E not grounded, just to see what it spits out then. This is what I get after adjusting some settings. It's certainly different, which I'll take as a good thing. IMG_20210723_182602

sparkiedk commented 3 years ago

So anytime the voltage on an output pin isn't 5V or 0V (within 10%) you're getting a collision between two logic outputs (one asserts high, the other low), looking at the clock i'd say one of the adjacent pins is shorted to it. What's odd is all the adjacent pins are inputs so that shouldn't cause what we're seeing.

Some of the action on DA7 looks like it could be executing from flash.

rhit-mahnee commented 3 years ago

Yeah, I figured something along those lines, but all of the voltage fluctuations I've seen are pretty small. At best I've seen flickers to 0V, but most of the signals stay above 4V.

Any suggestions for next steps?

sparkiedk commented 3 years ago

Sorry I left this hanging for so long, it's been hectic around here.

I'd advise going over everything with a meter to see if there's shorts or bad solder joints, perhaps go so far as to run it in external mode with all the logic disconnected, just to track down the output contentions.

you can also swap out the crystal with a much slower one if you want better visualization of the busses, and usually a breadboard isn't really great at high frequency (YMMV, but I think I used 4MHz for my work).

your choice of flash chip is ambitious for this level of prototyping and hacking, I used the pin through version which strikes me as a bit more forgiving: https://www.digikey.com/en/products/detail/microchip-technology/SST39SF040-70-4C-PHE/2297835 And I did my first experiments with a legit DIP28 EPROM (27c256) that I erased with an air purifier UV lamp from the hardware store.

My SDIP64 adapter board didn't have the copper plane "fingers" between the traces, so there's less chance of a solder bridge there. IMHO the plan layer on that board should be floating, check to make sure it is. also dumb question: there's nothing going on on the backside of the pcb, right? If you happen to make that board again do make sure to make your trace to plane spacing large - it's more forgiving for unmasked hand assy.

Another thing to watch out for: all these boards were conformal coated back in the day (that's why it smelled like death when you desoldered the microcontroller!) and that coating is all over the pins - I have to use a fair bit of acetone and a toothbrush to remove all the conformal coat from my chips.

rhit-mahnee commented 3 years ago

You're fine, take any time you need to. I appreciate all the help you're giving.

I'll give that another go. Every time I've soldered stuff I've checked to make sure there's continuity between the chip and adapter output for that pin and also that there's no continuity with any adjacent pins. Is there anything else I should be checking? Also, if I pull the external memory off the board, what should I be expecting out of the CPU, or what should I be looking for?

I'm currently using a 4MHz crystal. I look into getting another flash chip to see if that's contributing to my problem.

Plan layer? And yes, it's just a single-sided board. Just the base material with one sheet of machined copper on top. I also have a printed board somewhere that I could try.

I never actually noticed any unusual smells when working on this board or any of the others that I've touched. I'll see about cleaning off any remaining coating though. While removing the chip I was much more concerned about overheating it, and there's still some excess solder on the pins I haven't managed to get off.

sparkiedk commented 3 years ago

No unusual smells eh? Weird, I can see in the photos that there's definitely conformal coat on that board. Anyways dont worry about overheating the chip if you're using a reasonably good iron. I have a temperature controlled Weller set to 650 deg F and I went to town on these chips cleaning them up, they're pretty robust. I take the QFP versions off with a heat gun and pliers and they work just fine too - so you can spend some time cleaning up the pins to get better contact if you think that's an issue.

plan==plane, spell check only gets one so far it seems...

4MHz good, flash chip good, short check good, not sure what else I can offer. Some of the things you're seeing could be aliasing of successive traces on the scope? How fast is that scope?

If you run it externally without an external memory or latch logic you'll be executing noise as instructions, potentially even the address as instructions (due to the shared bus and parasitic capacitance), so lots of gibberish, probably some interesting faults, but mostly it should just be hammering away at the bus looking for instructions and data - yould be checking to make sure that there's no contention and everything looks about right. then add the latch without the memory and see if it still behaves. then add the memory, maybe with a simpler program (do you need me to write a simpler program? I can do that)

Also be mindful of your memory map: the processor will ask for bits using it's memory addresses, and your flash chip will be running at an offset, the first 32k of the flash chip will be the last 32k of the address space of the processor . this means that you'll have to offset your code in the flash so that it's where the processor expects it to be. The rest of the flash chip can be empty, provided you've properly shorted all the higher address lines to ground (which it looks like you have)

Also I just noticed that your logic IC's don't have decoupling capacitors, that could be causing some havoc on the bus. A 0.1uf ceramic across the power pins (yeah it's gonna be ugly) will go a long way. Even a small electrolytic will get the job done.