tidyverse / rvest

Simple web scraping for R
https://rvest.tidyverse.org
Other
1.49k stars 341 forks source link

how to use rvest script the url that page turning the url unchanged #319

Open biopig opened 3 years ago

biopig commented 3 years ago
library(rvest)
sequence<-as.data.table(c(">aaaa","ATCGATCGATCG"))
place<-read_html("https://www.dna.affrc.go.jp/PLACE/?action=newplace")
place_session<-html_session("https://www.dna.affrc.go.jp/PLACE/?action=newplace")
place_search<-html_form(read_html("https://www.dna.affrc.go.jp/PLACE/?action=newplace"))[[1]]
place_search<-set_values(place_search,query_seq=sequence)
place_test<-submit_form(place_session,form = place_search,submit = NULL)

and get the error

error: `form` doesn't contain a `action` attribute
Run `rlang::last_error()` to see where the error occurred.

I find that the page's url unchanged when I submit, so can you help me?

epiben commented 3 years ago

You need to give the form an action attribute; perhaps rvest should set action to the current URL if not otherwise specified by the form itself? This seems related to HTML5 vs. previous versions of HTML (e.g. https://stackoverflow.com/q/1131781).

This works on my end; took the liberty of tidying things up a bit including fixing the sequences input, under the assumption that the text field in the form accepts sequences separated by newlines:

library(rvest)

sequence <- paste(c(">aaaa", "ATCGATCGATCG"), collapse = "\n")
url <- "https://www.dna.affrc.go.jp/PLACE/?action=newplace"
place_session <- html_session(url)
place_search <- html_form(place_session)[[1]] %>%
    html_form_set(query_seq = sequence)
place_search$action <- url # slight hack

result <- html_form_submit(place_search) %>% 
    read_html() %>%
    html_elements("section") %>% 
    html_text()

Then, cat(result) matches the output on the webpage:

Wed Aug  4 14:43:29 JST 2021

>aaaa
ATCGATCGATCG

RESULTS OF YOUR SIGNAL SCAN SEARCH REQUEST

This result is the output of the new signal scan program which was completely rewritten
from a scratch by Akio Miyao ($Id: 649.pl,v 1.11 2016/04/20 08:43:39 miyao Exp $).

The original program of signal scan was reported in
Prestridge, D.S. (1991) SIGNAL SCAN: A computer program that scans DNA sequences for
eukaryotic transcriptional elements. CABIOS 7, 203-206.

>aaaa
12 base pairs

 (+) = Current Strand
 (-) = Opposite Strand

1      ATCGATCGATCG

    Factor or Site Name      Loc.(Str.)       Signal Sequence        SITE #
____________________________________________________________________________
//