ropensci / restez

:sleeping: :open_file_folder: Create and Query a Local Copy of GenBank in R
https://docs.ropensci.org/restez
Other
25 stars 5 forks source link

getting spliced coding sequences instead of the gene sequence #17

Open terrimporter opened 4 years ago

terrimporter commented 4 years ago
Session Info ```r When I use BioPerl to parse GenBank records there is a way to target coding sequences (CDS) and retrieve the spliced sequence (without introns). I would like to switch to restez for this task, but I can't seem to find a way to do this? See for example GenBank record X55026.1 where the COI gene range is 38501..63007 but the CDS is join(38501..38669,41209..41297,42536..42576,43837..43868, 46369..46451,47959..48055,49512..49633,51885..51978, 53386..53396,54384..54470,55738..55797,56925..56927, 58328..58357,59363..59519,60813..60862,62103..62120, 62525..63007) Thanks ```
DomBennett commented 4 years ago

Hi!

Unfortunately there's no readily available method to retrieve the spliced regions. The package does contain record extraction tools which can be coupled to do what you want.


# for example, download single whole sequence using rentrez
record <- rentrez::entrez_fetch(db='nuccore', id='X55026', rettype='gb')
# to check, run
# cat(record)
# eqv. to  restez::gb_record_get() if restez database existed

# manually pull out spliced sequence
# 1/ look up exons
features <- restez::gb_extract(record, 'features')
are_exons <- vapply(features, function(x) tolower(x[['type']]) == 'exon', logical(1))
exons <- features[are_exons]
length(exons)
# 2/ get locations
exon_locations <- vapply(exons, '[[', character(1), 'location')
exon_locations <- strsplit(x = exon_locations, split = '\\.\\.')
# 3/ assemble sequence
sequence <- restez::gb_extract(record, 'sequence')
sequence <- strsplit(x = sequence, split = '')[[1]]
exon_sequences <- unlist(lapply(X = exon_locations, function(x) {
  sequence[x[[1]]:x[[2]]]
}))
assembled_sequence <- paste0(exon_sequences, collapse = '')
nchar(assembled_sequence)

^This code is untested plus I'm not 100% sure it does exactly what you want.

Also, if you do come up with a good function(s), I welcome pull-requests!

maelle commented 2 years ago

For info, we're looking for a new maintainer / a new maintainer team for this package, see #23. If you read this and are a restez user, feel free to volunteer, we'd be happy to help.

joelnitta commented 2 years ago

I am the new maintainer. A PR would still be welcomed for this issue.