jcmf / glulx-strings

extract raw text fragments from interactive fiction glulx gblorb Inform
13 stars 2 forks source link

Zblorb gives garbage strings on translation #1

Closed andrewschultz closed 8 years ago

andrewschultz commented 8 years ago

Hey, dunno if you're still updating this, but I have a z-code file that doesn't translate correctly. It's at https://www.dropbox.com/s/bpxgwto2nfpeuk6/3dop.zblorb?dl=0 (I couldn't attach it in a zip).

The GBlorb stuff works great though and has been a big help for me in proofreading my and others' work! (and I can use TXD in a pinch.) Thanks for making it available to download locally & I also enjoy looking through the source--lots to learn.

jcmf commented 8 years ago

I just tried that file and it produces results that look plausible to me. Is there a particular string that you know should be there but isn't? Maybe the noise level is higher than you were expecting?

andrewschultz commented 8 years ago

Hello, sorry for the delay! But I think I found a good few examples. I hope they provide data instead of being spitballing bugs--as I suspect there's probably just one stray pointer, but you may need/want as many data points as possible to find the bug or test the bug fix. And I'd be glad to provide whatever testing you want/need, as I have a few zblorb games of my own I could check off on quickly to see if anything was amiss.

This is a string in the source: "He mentions that old puzzle about someone going south, east and north to arrive where he started, and how there's more than one point, before making some desultory joke about remembering to forget the lutefisk."

The "lutefisk" text actually appears 3 times in the text output, the first on line 114 on https://gist.github.com/andrewschultz/a89dc7545e8c6dc2fd973325f15ff45f

You'll need to see the full raw file to see the rest, but line 113 is also cut off. It starts midway through "understanding" in my source

"You pick up some static from the teleport device. Ed Dunn is babbling [if acro is 2]a[else]some[end if] three-letter acronym[unless acro is 2]s[end if] you feel half guilty for not understanding. Or not liking and not using. You know you've heard [if acro is 2]it[else]them[end if] before, though.

Line 3598 the source reads "What with the brutal unholy war over atheists['] best reasons to disbelieve in God..." so maybe the apostrophe is doing odd things? Lines 3653-3654 are almost duplicates.

3653: Mr. Dunn offers our services as a surprise gift to people who have impressed him. I am not surprised you have not." Whoa! 3654: Dunn offers our services as a surprise gift to people who have impressed him. I am not surprised you have not." Whoa!

Line 105 "scriptions of locations (even if you've been there before)." Line 173 "ere are none at all available!" (these are general library messages that get truncated)

. I noticed a lot of other repeating text like "lgfy" too. So maybe a pointer is getting misplaced or something. Let me know if you need me to try anything else.

jcmf commented 8 years ago

Thanks, that's very helpful! My #1 goal is to avoid having any text missing from the output of the program, and it looks like you've found several concrete examples of that, which is news to me, probably because I haven't paid much attention to the Z-code stuff, because everybody always uses Glulx for everything, right? Ha ha j/k, I don't know what planet I'm living on either. I mean it would be convenient if that were true; Glulx is so much more tractable. Anyway maybe while I'm in there trying to fix the blatant omissions I'll be motivated to think of other things that might reduce the amount of garbage and repetition.

andrewschultz commented 8 years ago

Glad I could help! Well, you got the main stuff done, as you mentioned. I've let details slip too.

Repetition and garbage are a bit down the list. I know when I scroll through game text I can zone that out pretty easily, so making sure no data is missing would be the big one.

Also, it's neat to follow the changes you've made already--good for learning about code and the z machine.

jcmf commented 8 years ago

I think some of the strings you quoted above are just partial strings, not the full original strings from your source. For example I think you left a whole sentence off the front of the lutefisk one -- the string you quoted doesn't align correctly on a 2-byte boundary.

It looks like this is the same game that you've published over on https://github.com/andrewschultz/threediopolis -- at least https://github.com/andrewschultz/threediopolis/blob/master/Threediopolis.zblorb seems to match that dropbox file, so I'm assuming that https://github.com/andrewschultz/threediopolis/blob/master/story.ni must be the corresponding source. Does that sound correct?

jcmf commented 8 years ago

Okay, I reworked it to use a completely different heuristic for finding strings in Z-code. It seems to work well on your file, better than before anyway. I hope I didn't break it for anything else! Please have a look at 4.3.0 and let me know what you think.

andrewschultz commented 8 years ago

It looks very good. Nice job! I'll try and do a little more regression checking on my zcode games too.