thelovelab / tximeta

Transcript quantification import with automatic metadata detection
https://thelovelab.github.io/tximeta/
64 stars 11 forks source link

Check if GENCODE TxDb exists in AnnotationHub #40

Closed mikelove closed 3 years ago

jma1991 commented 3 years ago

Maybe helpful? I wrote it so the function returns a boolean for each matching record so the user can explicitly check if a record exists in the AnnotationHub, and if it does, they can access the object attributes for the relevant metadata.

checkTxDbInHub <- function(source, organism, release, genome) {

    # Setup AnnotationHub service
    hub <- AnnotationHub()

    # Search data provider by source (grep to handle capitalization)
    source <- grepl(source, hub$dataprovider, ignore.case = TRUE)

    # Search species by organism (exact match required)
    organism <- organism == hub$species

    # Search title by release (grep to handle title string)
    release <- grepl(release, hub$title)

    # Search genome by genome (exact match required)
    genome <- genome == hub$genome

    # Search data class for TxDb (exact match required)
    rdata <- hub$rdataclass == "TxDb"

    # Determine matching records
    records <- Reduce(`&`, list(source, organism, release, genome, rdata))

    # Count number of matching records
    counter <- sum(records)

    # Handle single, multiple, and missing records
    if (counter == 1) {

        message("found single matching TxDb via 'AnnotationHub'")

        output <- TRUE

        attr(output, "ah_id") <- hub$ah_id[records]

        attr(output, "title") <- hub$title[records]

    } else if (counter > 1) {

        message("found multiple matching TxDb via 'AnnotationHub'")

        output <- rep(TRUE, times = counter)

        attr(output, "ah_id") <- hub$ah_id[records]

        attr(output, "title") <- hub$title[records]

    } else {

        output <- FALSE

        message("did not find matching TxDb via 'AnnotationHub'")

    }

    # Return boolean with metadata for each matching record
    return(output)

}

# Function usage
checkTxDbInHub(
    source = "GENCODE",
    organism = "Homo sapiens",
    release = "v23",
    genome = "GRCh37"
)
mikelove commented 3 years ago

Thanks! I've got some code for grabbing ensembldb from Ahub and I think I just need to extend that to the GENCODE if else statements. I'll be adding ASAP.

mikelove commented 3 years ago

Resolved with cb71533