Closed jamezpolley closed 4 years ago
It looks to me like the output of stdout is in the correct order and stderr is in the correct order but the two streams are getting intermingled in the wrong way.
@jamezpolley any chance that you could write a super simple test example in python (python is quicker to build with buildstep than ruby) that reproduces this problem? We could then add that to the test suite.
Thanks @jamezpolley for catching this bug!
I've made no progress on solving this today. Very annoying.
Well this is a surprisingly tricky problem. I've spent probably a day all up working on this so far without a clear solution but with a new clear line of enquiry. I'll do my best to summarise what I've learned so far:
So, it's just not possible to ensure a consistent ordering between the two streams. However, inside each stream the ordering should be consistent.
We can see this behaviour with this very simple python test case
from __future__ import print_function
import sys
# This tests that stdout and stderr are sent in the correct order
print("Line 1 (to stderr)", file=sys.stderr)
print("Line 2 (to stderr)", file=sys.stderr)
print("Line 3 (to stdout)")
print("Line 4 (to stdout)")
When you run this scraper you get the results in all kinds of different orders:
Note that the order inside of each stream is preserved but the order between the two streams appears random.
Now if you put in a slight pause:
from __future__ import print_function
import sys
import time
# This tests that stdout and stderr are sent in the correct order
print("Line 1 (to stdout)")
print("Line 2 (to stdout)")
time.sleep(0.1)
print("Line 3 (to stderr)", file=sys.stderr)
print("Line 4 (to stderr)", file=sys.stderr)
You get a consistent ordering of 1 2 3 4
If you watch closely while the scraper is running and you see the output of standard output and standard error being interleaved you'll actually notice that there is a slight pause between each pair of lines (one of stdout and one of stderr). This behaviour can not be explained by the concurrency issues above. So, we do have a different problem on our hands.
Next step is to try to reproduce the problem with a much simpler test scraper. I think this could be challenging.
Some progress... I've been able to reproduce the problem with a fairly simple test case:
from __future__ import print_function
import sys
import time
for line in range(1, 31):
print("Line", line, "(to stderr)", file=sys.stderr)
for line in range(1, 21):
print("Line", line)
time.sleep(0.5)
This produces the output (leaving out the build part):
Line 1 (to stderr)
Line 1
Line 2 (to stderr)
Line 3 (to stderr)
Line 4 (to stderr)
Line 5 (to stderr)
Line 6 (to stderr)
Line 7 (to stderr)
Line 8 (to stderr)
Line 9 (to stderr)
Line 10 (to stderr)
Line 11 (to stderr)
Line 12 (to stderr)
Line 2
Line 13 (to stderr)
Line 3
Line 14 (to stderr)
Line 15 (to stderr)
Line 4
Line 16 (to stderr)
Line 5
Line 17 (to stderr)
Line 18 (to stderr)
Line 6
Line 19 (to stderr)
Line 7
Line 20 (to stderr)
Line 21 (to stderr)
Line 8
Line 22 (to stderr)
Line 9
Line 23 (to stderr)
Line 24 (to stderr)
Line 10
Line 25 (to stderr)
Line 26 (to stderr)
Line 11
Line 27 (to stderr)
Line 12
Line 28 (to stderr)
Line 29 (to stderr)
Line 13
Line 30 (to stderr)
Line 14
Line 15
Line 16
Line 17
Line 18
Line 19
Line 20
So, it looks like if you output enough stuff at once to stderr it gets buffered (or something like that) and then only clears some of it when you output something to stdout. This is definitely not the behaviour we expect or want.
First solution to this problem was to create a single place where the http requests to the server are made. Currently the requests are made in two separate goroutines, one for each stream (stdout, stderr). To do this we basically make a simple background queue with a fixed size. We're essentially saying "log this bit of text" as quickly as possible when it comes in and putting that from both streams on to a single queue (which then retains more of the expected order).
We run into another problem when the queue fills up. We've set things relatively sensibly to apply "back pressure" on the stream goroutine by waiting when it tries to add a log item to the queue and the queue is filled up. When this happens we are basically in the same place as when we didn't have a queue.
It turns out the queue fills up really quickly because the http requests are really slow. There are two potential solutions to this:
After much toing and froing it looked like attacking (1) head on was the best approach for now. (2) is more of an optimisation whereas (1) is fixing a problem where an http request that we expect to be taking a few ms takes around 500ms.
This was actually fixed by the same PR as fixed #116
Reproduction:
./client.sh test/scrapers/multiple_icon data.sqlite
Expected outcome:
Observed outcome:
I've attached complete output from a run of the scraper. In particular, the problem was noticeable here:
I note that https://github.com/planningalerts-scrapers/icon_scraper/issues/4 seems to be similar; but in that case, it appears that all the lines of an error message do appear together, but they appear in the middle of the regular output from another scraper. Here. the error lines seem to be interspersed line-by-line with regular output from another scraper.
multiple_icon_output.txt