un-fao / calipseo-shiny

Calipseo is FAO web- based platform for national Fisheries Authorities to streamline management of fisheries data and the production, analysis and reporting of fishery statistics.
1 stars 0 forks source link

[CALR-51] vesselFindeR doesn't work anymore #51

Closed eblondel closed 6 months ago

eblondel commented 6 months ago

Issue migrated from JIRA: https://sdlc.review.fao.org/jira/browse/CALR-51 Creator/Reporter: Emmanuel Blondel @eblondel Assignee: Emmanuel Blondel @eblondel Priority: Highest Status: Done Date of creation: 2021-12-21T09:03:38.000+0000

vesselFindeR was broken by task on CALR-46

Please reconsider the original vesselFindeR function, that works: {code:java}

vesselFindeR

vesselFindeR <- function(name, flag_iso2){ html = httr::content(httr::GET(sprintf("https://www.vesselfinder.com/vessels?name=%s&flag=%s", name, flag_iso2), httr::add_headers("User-Agent" = "vesselFindeR"))) tbl = xml2::xml_find_all(html, ".//table") if(length(tbl)==0) return(NULL) tbl = tbl[[1]] df = rvest::html_table(tbl)

links

alinks = xml2::xml_find_all(html, ".//a") alinks = alinks[sapply(alinks, function(x){ if(!xml2::xml_has_attr(x,"class")) return(FALSE) xml2::xml_attr(x, "class") == "ship-link" })] if(length(alinks)>0){ df$link <- paste0("https://www.vesselfinder.com", sapply(alinks, function(x){xml2::xml_attr(x,"href")})) }

keep first one

df = df[1L,] df = as.list(df)

go to pick up details and image

html2 = httr::content(httr::GET(df$link, httr::add_headers("User-Agent" = "vesselFindeR"))) imgs = xml2::xml_find_all(html2, ".//img") imgs = imgs[sapply(imgs, function(x){ if(!xml2::xml_has_attr(x, "class")) return(FALSE) xml2::xml_attr(x, "class") == "main-photo" })] if(length(imgs)>0){ df$img_href = xml2::xml_attr(imgs[[1]],"src") } return(df) } {code} Please make sure that the output remains a named "list" ensuring it is extended to manage extra vessel characteristics, but not altering the the way the above function is returning fields, in particular the image link.

eblondel commented 6 months ago

Author: Emmanuel Blondel @eblondel Date of creation: 2021-12-21T11:45:41.844+0000 Last update: 2021-12-21T11:45:41.844+0000

fixed by reverting vesselFindeR to rely on httr GET instead of xml2::read_html