Open ttytm opened 1 year ago
How large is the http response? I remember seeing a similar issue that arised when strings of about 100k+ characters were parsed.
Thanks for taking the time to confirm this @Casper64.
And yes, responses in the used example above are usually more than 100k characters.
Edit: What I'm noticing after some exploratory testing is that limiting the length of the response reduces the failure rate. But the error still occurs when the length is about 15k characters. Turns out there is already an earlier out of bounds error just trying to assign the response body, prior to parsing.
_ := resp.body
Has potential for the same error with large response bodies.
error output with V 4.2 (ticker names vary per run):
Initial request failed for: inab response does not start with HTTP/, line: ``
Initial request failed for: tela invalid chunksize
Initial request failed for: tsla invalid chunksize
Try with -d net_blocking_sockets -d use_openssl
. For me, this:
import net.http
import sync.pool
fn get_news(mut pp pool.PoolProcessor, idx int, wid int) voidptr {
ticker := pp.get_item[string](idx)
resp := http.get('https://finance.yahoo.com/quote/${ticker}/press-releases') or {
eprintln('Initial request failed for: ${ticker} ${err}')
return pool.no_result
}
if resp.status_code != 200 {
eprintln('Failed requesting press release: ${resp.status}')
return pool.no_result
}
eprintln(resp.body#[..20])
return pool.no_result
}
fn main() {
mut pp := pool.new_pool_processor(callback: get_news)
pp.work_on_items(['inab', 'cyto', 'top', 'tsla', 'tela'])
}
produces:
#0 11:33:36 ᛋ master /v/vnew❱v -d net_blocking_sockets -d use_openssl run yahoo.v
<!DOCTYPE html><html
<!DOCTYPE html><html
<!DOCTYPE html><html
<!DOCTYPE html><html
<!DOCTYPE html><html
#0 11:33:56 ᛋ master /v/vnew❱
Awesome, thanks @spytheman !
@ttytm , Look at it when you're free, thanks. It looks like it can be closed :)
Yes, the issue is solved. Thank you! Although the time it takes to make the requests is imho still a dealbreaker to use the stdlib for this.
Thanks for your feedback!
Yes, at the moment it is only based on existing fixes, not structural and essential improvements. Such as "partial requests" and so on, which are not yet implemented. I would like to find time to improve it further
Describe the bug
Although this bug isn't always present, its probability of occurring rises with the number of items we add. In the case of 5 requests in the reproduction, the likelihood of encountering this bug is very high. So you probably won't need to run it repeatedly.
Expected Behavior
Parsing works.
Current Behavior
Reproduction Steps
Possible Solution
Go like reader interface for response bodies.
No response
Additional Information/Context
The above code has been reduced to not contain headers and workarounds which help us to avoid another common issue we are encountering with concurrent requests:
SSL handshake failed
orresponse does not start with HTTP/, line:
. So running the above reproduction is likely to trigger such errors as well.As a workaround, we usually add custom headers and wrap the requests in an extended
try_request
function that retries failed requests due to SSL handshake errors. This is a problem that occurs not only when scraping, but also when working with, for example, the Github api. But out of these the main bug of this issuesubstr() out of bounds
is the most fatal an not recoverable.V version
0.3.4
Environment details (OS name and version, etc.)
Linux, arch.