Closed tdivoll closed 7 years ago
My data file looks like so:
seqID | seqs |
---|---|
denovo0 | ATGCGTACCTA.. |
denovo1 | ATGCTAGTCAC... |
hi @tdivoll - thanks for opening the issue!
I'll have a look and get back to you asap
If you make your input a named list the names of each sequence should be retained.
try as.list(setNames(mydata$seqs, mydata$seqID))
which should give you a named list
e.g.
mydata <- data.frame(seqID = letters[1:3], seqs = c('AADDF', 'ADDSFSDFD', 'ADFSDF'),
stringsAsFactors = FALSE)
as.list(setNames(mydata$seqs, mydata$seqID))
#> $a
#> [1] "AADDF"
#>
#> $b
#> [1] "ADDSFSDFD"
#>
#> $c
#> [1] "ADFSDF"
Thank you for this solution, it worked great! I was missing the stringsAsFactors = FALSE
when reading in my file and then the setNames()
command worked great with my seqIDs.
I was able to add the row names as a column with rownames_to_column()
in the 'tibble' package and filter()
out low percentage matches in the final data frame with the 'dplyr' package before writing to xlsx.
Tim
Great, glad it worked!
I am trying to batch process sequences with bold_identify and then write results to xlsx. I have completed this task but I lose the original sequence IDs when only the sequences are passed to bold_identify. Would it be possible to add an option to include those when bold_identify returns data from the BOLD API?
This bold package is great as I prefer denovo clustering with post hoc taxonomy assignment over clustering with a reference database.
Thank you for any help with this, Tim