Open ivirshup opened 1 year ago
Notes on filtering behaviour: in bioconductor the filtering is done at the level of transcripts, before the promoter sequences are defined.
Example: if promoter is within filtered range, but the transcript is not, then the promoter is not returned:
> transcripts(EnsDb.Hsapiens.v86, filter=GRangesFilter(GRanges('1:9000-12000'), type='any'))
GRanges object with 1 range and 6 metadata columns:
seqnames ranges strand | tx_id tx_biotype tx_cds_seq_start tx_cds_seq_end gene_id tx_name
<Rle> <IRanges> <Rle> | <character> <character> <integer> <integer> <character> <character>
ENST00000456328 1 11869-14409 + | ENST00000456328 processed_transcript <NA> <NA> ENSG00000223972 ENST00000456328
> promoters(EnsDb.Hsapiens.v86, filter=GRangesFilter(GRanges('1:9000-12000'), type='any'))
GRanges object with 1 range and 6 metadata columns:
seqnames ranges strand | tx_id tx_biotype tx_cds_seq_start tx_cds_seq_end gene_id tx_name
<Rle> <IRanges> <Rle> | <character> <character> <integer> <integer> <character> <character>
ENST00000456328 1 9869-12068 + | ENST00000456328 processed_transcript <NA> <NA> ENSG00000223972 ENST00000456328
-------
seqinfo: 1 sequence from GRCh38 genome
With within filtering:
> transcripts(EnsDb.Hsapiens.v86, filter=GRangesFilter(GRanges('1:9000-12000'), type='within'))
GRanges object with 0 ranges and 6 metadata columns:
seqnames ranges strand | tx_id tx_biotype tx_cds_seq_start tx_cds_seq_end gene_id tx_name
<Rle> <IRanges> <Rle> | <character> <character> <integer> <integer> <character> <character>
-------
seqinfo: no sequences
> promoters(EnsDb.Hsapiens.v86, filter=GRangesFilter(GRanges('1:9000-12000'), type='within'))
GRanges object with 0 ranges and 6 metadata columns:
seqnames ranges strand | tx_id tx_biotype tx_cds_seq_start tx_cds_seq_end gene_id tx_name
<Rle> <IRanges> <Rle> | <character> <character> <integer> <integer> <character> <character>
-------
seqinfo: no sequences
Description of feature
Retrieve transcription start sites for genes for use with ATAC data.
This could be similar to the
GenomicFeatures::promoters
function (description available in this vignette) which:This could be done using
bioframe.expand
(though that is currently not strand aware: https://github.com/open2c/bioframe/issues/144). This could instead by done withibis
with:ifelse