wader / fq

jq for binary formats - tool, language and decoders for working with binary and text formats
Other
9.79k stars 227 forks source link

tzx: Add suport for ZX Spectrum TZX and TAP files #975

Closed mrcook closed 3 months ago

mrcook commented 3 months ago

For those not familiar with the ZX Spectrum, this was a popular 1980's home micro in Europe and various other countries, and sold in excess of 5 million units. Most software was distributed on cassette tapes and these were encoded as TZX or TAP formats for use in emulators.

This PR adds decoders for these formats based on the specification, and also my previous efforts in writing a TZX/TAP decoder.

The test file includes a simple BASIC program and was exported from the FUSE emulator -- the TZX was manually edited to add some "archive info" (metadata).

Perhaps in a future update I could add functionality to decode the BASIC programs.

wader commented 3 months ago

Failures all.fqtest i think is just to run tests with -update.

The failure in the zip tests i think is about that the tap decoder is in the probe group and succeeds on empty input (for !d.End() { ... } does no iteration), so maybe remove from probe group, if it's shaky to probe, or maybe make sure that at least one block was decoded? but it seems there is no real signature for a block so might not be possible? 🤔

mrcook commented 3 months ago

Is there some large archive of tap and txz files one could stress test the decoders on?

Hi Matthias, the go to place these days is https://spectrumcomputing.co.uk. I've tested over 50 tape images myself: mostly TZX but a dozen or so TAPs too.

Most tapes seems to be just the Standard Speed Data and Turbo Speed Data blocks, and I've struggled to find many TZX with the more interest blocks.

There are some collections available on the Internet Archive, so I plan to do some bulk testing at some point.

mrcook commented 3 months ago

The failure in the zip tests i think is about that the tap decoder is in the probe group and succeeds on empty input (for !d.End() { ... } does no iteration), so maybe remove from probe group, if it's shaky to probe, or maybe make sure that at least one block was decoded? but it seems there is no real signature for a block so might not be possible? 🤔

Yep, and in fact the specs state: "The TAP file may be empty. Then it has a size of 0 bytes".

I hadn't looked into what that probe group does, I just started with a copy of the format/nes decoder 🙃 I'll try removing it and see if that works.

wader commented 3 months ago

There are some collections available on the Internet Archive, so I plan to do some bulk testing at some point.

Great! if you end up writing some script or has some snippet to do it you can include that in testdata/README.md also. I've usually tried to document test data well so that me, you or someone in the future has some resonable chance at updating them :)

wader commented 3 months ago

Yep, and in fact the specs state: "The TAP file may be empty. Then it has a size of 0 bytes".

I hadn't looked into what that probe group does, I just started with a copy of the format/nes decoder 🙃 I'll try removing it and see if that works.

Aha i see. Being part of the "probe group" just means that fq will, if no format or group is specific with -d, try to decode using all format in that group and the first format to succeed will be selected. So that is why it's important to not succeed on empty input and also try to fail fast if part of the probe group. Also some archive formats like zip uses the probe group to do nested decoding.

About nes decoder: Oh then i should probably have a look at it so it does not have the same issue.

wader commented 3 months ago

Just some minor comments that you might want to do, otherwise i think it look good as it is now. And can always do more changes later on, i like to iterate and not let PRs drag on too long. Let me know when you feel your happy with the code and testing.

BTW what's your use case for this? just curious how the formats work or using to do some aggregations or querying for ROMs features etc?

mrcook commented 3 months ago

Just some minor comments that you might want to do, otherwise i think it look good as it is now. And can always do more changes later on, i like to iterate and not let PRs drag on too long. Let me know when you feel your happy with the code and testing.

I agree that iterating is better than having long running PRs. Your new comments can certainly be considered improvements rather than requirements.

Whatever you think is best. :)

BTW what's your use case for this? just curious how the formats work or using to do some aggregations or querying for ROMs features etc?

Exactly that; extracting metadata, features, etc. The archive info (title, publisher, etc.) is the obvious one, but also the more interesting blocks like Hardware Type. Possibly this could be used for verifying/updating the https://github.com/zxdb/ZXDB

I already have the core of a BASIC decoder written, so in the future I'd also like to add that.

wader commented 3 months ago

Just some minor comments that you might want to do, otherwise i think it look good as it is now. And can always do more changes later on, i like to iterate and not let PRs drag on too long. Let me know when you feel your happy with the code and testing.

I agree that iterating is better than having long running PRs. Your new comments can certainly be considered improvements rather than requirements.

Whatever you think is best. :)

The smaller things maybe in this PR, the larger about fragments etc maybe later if you feel motivated.

BTW what's your use case for this? just curious how the formats work or using to do some aggregations or querying for ROMs features etc?

Exactly that; extracting metadata, features, etc. The archive info (title, publisher, etc.) is the obvious one, but also the more interesting blocks like Hardware Type. Possibly this could be used for verifying/updating the https://github.com/zxdb/ZXDB

Sounds similar to my use case for media files, diff, aggregate and query for very things. But i'll say a majority of the decoders in fq exist mostly out of curiosity :)

I already have the core of a BASIC decoder written, so in the future I'd also like to add that.

👍 I've done some experiments with decoding ISA:s to be something like a "structured disassembler", see https://github.com/wader/fq/pull/215, maybe can give some inspiration how it might work.

mrcook commented 3 months ago

The smaller things maybe in this PR, the larger about fragments etc maybe later if you feel motivated.

I should be able to take a look at that tomorrow evening.

wader commented 3 months ago

test -update should fix CI error

wader commented 3 months ago

Looks good. Ready to merge or you wanted to do some more testing?

mrcook commented 3 months ago

I'd be happy with what we have here now, so yes, feel free to merge it.

mrcook commented 3 months ago

Thanks Matthias! I also wanted to thank you for all the help in making my contribution better.

Last but no least, thanks for creating this great tool!!

wader commented 3 months ago

Glad to hear that 😊 and looking forward to future contributions! and if you have questions about how to use fq in general i'm happy to help, always interesting and useful to know how ppl are using it.