db.rnacentral
contains code to mirror the data from RNACentral releases, as well as pointers to the location of the data.
For each supported version of RNACentral, two files are available:
id_mapping.tsv
—. "Tab-separated file with RNAcentral ids, corresponding external ids, NCBI taxon ids, RNA types (according to INSDC classification), and gene names"¹.rnacentral_species_specific_ids.fasta
—. "Current set of sequences that are present in at least one expert database using the species specific URS ID's"².All the data in db.rnacentral
is versioned following the RNACentral releases number scheme.
Each of these versions is encoded as an object that extends the sealed class Version
.
The Set
Version.all
contains all the releases supported and maintained through db.rnacentral
.
The module db.rnacentral.data
contains the pointers to the S3 objects where the actual files are stored. The path of the S3 objects corresponding to the id mappings and the sequence data can be accessed evaluating the following functions over a Version
object:
idMappingTSV : Version => S3Object
speciesSpecificFASTA : Version => S3Object
A convenient value grouping both files can be accessed (again parametrized by the version) through the function:
everything : Version => S3Object
The path to the S3 objects returned by those functions something like the following:
s3://resources.ohnosequences.com/ohnosequences/db/rnacentral/<version>/id_mapping.tsv
s3://resources.ohnosequences.com/ohnosequences/db/rnacentral/<version>/rnacentral_species_specific_ids.fasta
1: ftp://ftp.ebi.ac.uk/pub/databases/RNAcentral/releases/10.0/id_mapping/readme.txt
2: ftp://ftp.ebi.ac.uk/pub/databases/RNAcentral/releases/10.0/id_mapping/readme.txt