Add user seqID to bold_identify results

ropensci / bold

Interface to the Bold Systems barcode webservice

https://docs.ropensci.org/bold

Other

17 stars 11 forks source link

Add user seqID to bold_identify results #38

Closed tdivoll closed 7 years ago

tdivoll commented 7 years ago

I am trying to batch process sequences with bold_identify and then write results to xlsx. I have completed this task but I lose the original sequence IDs when only the sequences are passed to bold_identify. Would it be possible to add an option to include those when bold_identify returns data from the BOLD API?

    > output <- bold_identify (mydata$seqs, db = "COX1", response=FALSE)
    > out20 <- lapply(output, head, n=20)
    > outframe <- do.call("rbind", lapply(out20, data.frame))
    > write.xlsx (outframe, "outframe.xlsx")

This bold package is great as I prefer denovo clustering with post hoc taxonomy assignment over clustering with a reference database.

Thank you for any help with this, Tim

tdivoll commented 7 years ago

My data file looks like so:

seqID	seqs
denovo0	ATGCGTACCTA..
denovo1	ATGCTAGTCAC...

sckott commented 7 years ago

hi @tdivoll - thanks for opening the issue!

I'll have a look and get back to you asap

sckott commented 7 years ago

If you make your input a named list the names of each sequence should be retained.

try as.list(setNames(mydata$seqs, mydata$seqID)) which should give you a named list

e.g.

mydata <- data.frame(seqID = letters[1:3], seqs = c('AADDF', 'ADDSFSDFD', 'ADFSDF'), 
                     stringsAsFactors = FALSE)
as.list(setNames(mydata$seqs, mydata$seqID))
#> $a
#> [1] "AADDF"
#> 
#> $b
#> [1] "ADDSFSDFD"
#> 
#> $c
#> [1] "ADFSDF"

tdivoll commented 7 years ago

Thank you for this solution, it worked great! I was missing the stringsAsFactors = FALSE when reading in my file and then the setNames() command worked great with my seqIDs.

I was able to add the row names as a column with rownames_to_column() in the 'tibble' package and filter() out low percentage matches in the final data frame with the 'dplyr' package before writing to xlsx.

Tim

sckott commented 7 years ago

Great, glad it worked!