At present, 40-50% of wiki articles are rejected because they do not contain sections with greater than our minimum number of words. This introduces extra latency as we sequentially download additional articles until finding a suitable one.
We should instead download batches of pages concurrently so that we don't need to wait for sequential processing times.
At present, 40-50% of wiki articles are rejected because they do not contain sections with greater than our minimum number of words. This introduces extra latency as we sequentially download additional articles until finding a suitable one.
We should instead download batches of pages concurrently so that we don't need to wait for sequential processing times.