nptscot / npt

Data processing code, also use this repo for issue tracking for the Network Planning Tool. See https://nptscot.github.io for development version
https://www.npt.scot/
GNU Affero General Public License v3.0
5 stars 0 forks source link

Try with 250 routes and new routes #409

Closed atumscott closed 1 month ago

Robinlovelace commented 4 months ago

Heads-up @wangzhao0217 it worked with a mimal build with max at 250. Now going to try a full build.

Robinlovelace commented 4 months ago

image

Robinlovelace commented 4 months ago

Issue confirmed:

✖ errored target rs_school_fastest
✖ errored pipeline [15.835 seconds]
Error:
! Error running targets::tar_make()
Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
Debugging guide: https://books.ropensci.org/targets/debugging.html
How to ask for help: https://books.ropensci.org/targets/help.html
Last error message:
    Timeout was reached: [api.cyclestreets.net] SSL connection timeout
Last error traceback:
    get_routes(od = od_school %>% slice_max(n = parameters$max_to_route,    ...
    cyclestreets::batch(desire_lines = od, wait = TRUE, maxDistance = 30000,...
    batch_routes(desire_lines, name, serverId, strategies, bothDirections,  ...
    httr::POST(url = batch_url, body = body, httr::timeout(600))
    request_perform(req, hu$handle$handle)
    request_fetch(req$output, req$url, handle)
    request_fetch.write_memory(req$output, req$url, handle)
    curl::curl_fetch_memory(url, handle = handle)
    .handleSimpleError(function (condition)  {     state$error <- build_mess...
    h(simpleError(msg, call))
Robinlovelace commented 4 months ago

@mvl22 we're seeing this message a lot, any ideas, R side or CycleStreets side most likely?

    Timeout was reached: [api.cyclestreets.net] SSL connection timeout
Robinlovelace commented 4 months ago

It fails intermittently, the 60k job is now done:

● completed target rs_school_fastest [19.172 minutes]
mvl22 commented 4 months ago

Why are you using that endpoint rather than your dedicated one?

Robinlovelace commented 4 months ago

Why are you using that endpoint rather than your dedicated one?

I think it is using the dedicated server.

Robinlovelace commented 4 months ago

Latest error : ( this is not very fail safe. cc @mem48

✖ errored target rs_commute_fastest
✖ errored pipeline [3.924 hours]
Warning message:
6 targets produced warnings. Run targets::tar_meta(fields = warnings, complete_only = TRUE) for the messages. 
Error:
! Error running targets::tar_make()
Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
Debugging guide: https://books.ropensci.org/targets/debugging.html
How to ask for help: https://books.ropensci.org/targets/help.html
Last error message:
    UNCLOSED_STRING: A string is opened, but never closed.
Last error traceback:
    get_routes(od = od_commute_subset, plans = "fastest", purpose = "commute...
    cyclestreets::batch(desire_lines = od, wait = TRUE, maxDistance = 30000,...
    get_routes(url = res_joburls$dataGz, desire_lines, filename,      direct...
    batch_read(filename_local, cols_to_keep = cols_to_keep, segments = segme...
    json2sf_cs(results_raw = res$json, id = res$route_number, segments = seg...
    RcppSimdJson::fparse(results_raw, query = "/marker", query_error_ok = TR...
    .deserialize_json(json = json, query = query, empty_array = empty_array,...
    .handleSimpleError(function (condition)  {     state$error <- build_mess...
    h(simpleError(msg, call))
Robinlovelace commented 4 months ago

Think out loud / while typing: it's a bit high stakes to do all of Scotland in 1 go. Latest thought: regionalise / county-by-county would be a logical solution I think.

mvl22 commented 4 months ago

Ah sorry, I see that’s the batch data API request not internal journey API requests. I’ll check. We’re not seeing downtime pings at this end as far as I’m aware.

mem48 commented 4 months ago

I have pointed out this before #261. The should be one target for sending jobs to cyclestreets, another for downloading results, and a third for reading them in. That was you can recover if any step fails

Robinlovelace commented 4 months ago

@mem48 can you try the following and let us know if you get an error (download the file manually from the releases if you don't have the CLI tool set-up)?

# requires the gh cli tool to be installed and set up:
gh release download v2024-02-11-test-batch

And then in R

cyclestreets:::batch_read("test.csv.gz")

The issue is that the batches are too big: when we download the files (as we currently do although to a generic place in tempdir not a well-named file) that's not much help if part of the file is corrupt for whatever reason.