hangnoh / flybaseR

2 stars 2 forks source link

input the fly stock number and get the plain genotype using R? #2

Open rmd13 opened 5 years ago

rmd13 commented 5 years ago

Dear hangnoh Is it possible to write a function to input the fly stock number and get the plain genotype using R? I have a list of fly stock number and majority of them are from Bloomington, and some of them from Tokyo, and for all these stock number I can find information on flybase. For example, input stock ID 11572 get the genptype: P{ry[+t7.2]=PZ}frc[02619] ry[506]/TM3, ry[RK] Sb[1] Ser[1]

Thanks

Originally posted by @rmd06 in https://github.com/hangnoh/flybaseR/issues/1#issuecomment-449247772

hangnoh commented 5 years ago

HI! I can make one for the Bloomington stock center easily, I think. Let me do that over the weekend!

hangnoh commented 5 years ago

I just recalled that Bloomington stock center provides a very neat list of their stocks with corresponding genotypes. I think it is better to use the list rather than causing network traffic. Please check out the link below.

https://bdsc.indiana.edu/stocks/stockdata.html

rmd13 commented 5 years ago

Yes I got the list csv file.

I found that the http://flybase.org/ has a quick search item called data class, and if I select Stock and input stock number, it will return a page with correct item. This can be used to search for any stock center, and it is really powerful. I've just learned your R code, and tring to do this via R. But I am a beginner, and may take long time to finish.

rmd13 commented 5 years ago

session <- html_session("http://flybase.org")

form.original <- html_form(session)[10] [[10]]:

'dataclass_form' (POST /search/) 'fld': fbxx-? 'tab': dataType_tab 'caller': quicksearch 'species': Dmel
rmd13 commented 5 years ago

Finally I made it: below is the full code:

library(rvest) session <- html_session("http://flybase.org")

这个网页含有两个查询表单,第一个是顶部的j2g_search_form(uniquery.pl),另一个是中间的converter(export2batch.pl)

    form.original <- html_form(session)[10][[1]] #或者[[10]]
        # [[10]]: 要的就是dataclass_form
        # <form> 'dataclass_form' (POST /search/)
        # <input hidden> 'fld': fbxx-?
        # <input hidden> 'tab': dataType_tab
        # <input hidden> 'caller': quicksearch
        # <input hidden> 'species': Dmel
        # <button submit> '<unnamed>
        # <input radio> 'field': SYM     #单选之: symbol/id?
        # <input radio> 'field': ALLTEXT #单选之: all text?
        # <select> 'data_class' [0/33]  # 选stock,
        # <input text> 'query': #输入查询id

    stock_IDsQuery = c("CH321-94A02","7568","6367","24343","28827","150337","2363")
    Stock_PlainGenoTypeAss <- rep_len("", length(stock_IDsQuery))
    Stock_shortGenotypeAss <- rep_len("", length(stock_IDsQuery))
    Stock_IDechoAss <- rep_len("", length(stock_IDsQuery))
    i = 0;
    for (aStock in stock_IDsQuery) {
      i = i + 1;
      form <- set_values(form.original, field = "SYM", data_class = "Stock", query = aStock)
      result_raw <- submit_form(session, form)[[6]][[6]];
      result <-as.character(rawToChar(result_raw));

      pattern = "FBst\\w+\""; #终于搞定!
      gregout <- gregexpr(pattern,result,ignore.case = F,perl = F,fixed = F)
      if (!identical(-1L,  gregout[[1]][1])) {
        aHit1st = gregout[[1]]
        aHit1stLen = gregout[[1]] + attr(gregout[[1]],'match.length') - 2
        aStock_ID = substr(result,aHit1st,aHit1stLen)
        aStock_Http = paste("http://flybase.org/reports/",aStock_ID, sep = "")
        aStock_Html <- read_html(aStock_Http)
        Stock_PlainGenoTypeAss[i] <- aStock_Html %>% html_nodes(".row:nth-child(6) .col-sm-9") %>% html_text()
        #  "w[1118]; Dp(3;2)GV-CH321-94A02, PBac{y[+mDint2] w[+mC]=GV-CH321-94A02}VK00037"
        aStock_IDre_ <- aStock_Html %>% html_nodes(".row:nth-child(4) .field_label+ .col-sm-height") %>% html_text()
        Stock_IDechoAss[i] = aStock_IDre_[[1]] 
        # [1] "FBst0550356"
        Stock_shortGenotypeAss[i] <- aStock_Html %>% html_nodes(".row:nth-child(7) .col-sm-9") %>% html_text()
        # "w1118; Dp(3;2)GV-CH321-94A02, PBac{GV-CH321-94A02}VK00037"
      }
    }
hangnoh commented 5 years ago

OMG, this is awesome. Would you mind that I incorporate your method somehow to the original script? I am not that much used to about this github things, so I don't know how I can invite you to contribute/edit.

Also, I would like to write a code that accesses Flybase "Batch Download" tool as well. Please stay tuned.

rmd13 commented 5 years ago

No problem you can insert to your code.