paracrawl / b64filter

Program for operating on one document per Base 64 encoded line files
1 stars 0 forks source link

b64filter does not halt when child command fails #3

Closed jelmervdl closed 4 years ago

jelmervdl commented 4 years ago

I've been able to reproduce it with this command:

gzip -cd ~/hieu/nn/0/1/sentences.gz | b64filter parallel --halt 2 -j 16 --pipe -k -l 1024 false

parallel will indicate that false failed, but b64filter won't return and hang indefinitely.

It only happens for small sets of input sentences. If I try this with a sentences.gz from English, it will fail and stop on a pipe error.

wwaites commented 4 years ago

Well, that's strange. b64filter is actually hanging when it tries to put a line count on the queue and the queue is full, as it will be if there's been nothing to consume the input.

At this point the process has exited but has not been waited for, it still exists and is found by os.FindProcess. The cmd.ProcessState isn't filled in yet because we haven't waited. We haven't waited for it because we must wait after consuming its output, which of course we haven't done because the process isn't making any.

Oddly, writing to the input of the process when it is like this does not produce an error. I thought it would (see around https://github.com/paracrawl/b64filter/blob/master/b64filter.go#L245). That might be a Go bug.

Need to figure out how to check there that the process isn't a zombie.

jelmervdl commented 4 years ago

My initial idea was to just put the reading & queueing of docs in a goroutine as well and have the main thread only start those reading & writing goroutines and then proc.wait on the process to quit. If it does, and it wasn't happy, cancel/abort the reading and try to flush the writing and then fail appropriately?

But I am not experienced enough with Go to just go on and implement it instead of reporting the bug.

wwaites commented 4 years ago

Actually, I think it's simpler than that. Before we used an output buffer, we had to deal with the behaviour of cmd.Wait which wanted all of the output to be consumed first. That is no longer the case, according to https://golang.org/pkg/os/exec/#Cmd.Wait. So I think we can get rid of the done signalling channel on the output goroutine entirely, as well as the code that waits for the output queue to drain, and simply Wait normally.

wwaites commented 4 years ago

Try that on for size.

jelmervdl commented 4 years ago
go get -u github.com/paracrawl/b64filter
[cs-vand1@login-e-12 ~]$ gzip -cd ~/hieu/nn/0/1/sentences.gz | b64filter parallel --halt 2 -j 16 --pipe -k -l 1024 false
parallel: This job failed:
false
2020/03/11 10:04:15 b64filter.go:257: error waiting for command: exit status 1

👍

wwaites commented 4 years ago

turns out, the silly dance was there for a reason...

jelmervdl commented 4 years ago

I've used the code in #4 to translate about 2/3 of the languages in Hieu's data and it seems to hold up. Any issues I've had with it I can now relate to other causes.

Still, the patch is only half a solution. b64filter will still break if the captured program doesn't fail hard with a non-zero exit code, but quietly produces incorrect output.