Open jiangshan529 opened 1 year ago
Hi there,
You can export the insulation score as a bedgraph for viewing in IGV as follows. The {rtracklayer} package has several convenient export functions.
For reproducibility purposes, let's presume we have an insulation score object calculated by GENOVA.
library(GENOVA)
library(rtracklayer)
exp <- get_test_data("40k")
ins <- insulation_score(exp)
To export as a bedgraph, we need to convert the data.frame that is inside ins
to a GenomicRanges object. Note that we should typically add +1 to starts, as GENOVA tends to use the bedfile convention for intervals (starts at 0) whereas GenomicRanges use 1-based starts.
# Extract data: rename and drop 'bins' column
data <- setNames(
ins$insula_score[-4],
c("chrom", "start", "end", "value")
)
data <- data[is.finite(data$value),]
# Convert to GRanges
gr <- with(data, GRanges(chrom, IRanges(start + 1, end)))
score(gr) <- data$value
# Export to bedgraph
export.bedGraph(gr, here::here("my_bedgraph.bedGraph"))
If we want to export to bigwig instead, we need to include the sequence lengths of every chromosome as well. The {rtracklayer} also has a convenient function for this.
# Export to bigwig
gr <- with(
data,
GRanges(
chrom,
IRanges(start + 1, end),
seqinfo = SeqinfoForUCSCGenome("hg19")
),
)
score(gr) <- data$value
export.bw(gr, here::here("my_bigwig.bw"))
Hi there,
You can export the insulation score as a bedgraph for viewing in IGV as follows. The {rtracklayer} package has several convenient export functions.
For reproducibility purposes, let's presume we have an insulation score object calculated by GENOVA.
library(GENOVA) library(rtracklayer) exp <- get_test_data("40k") ins <- insulation_score(exp)
To export as a bedgraph, we need to convert the data.frame that is inside
ins
to a GenomicRanges object. Note that we should typically add +1 to starts, as GENOVA tends to use the bedfile convention for intervals (starts at 0) whereas GenomicRanges use 1-based starts.# Extract data: rename and drop 'bins' column data <- setNames( ins$insula_score[-4], c("chrom", "start", "end", "value") ) data <- data[is.finite(data$value),] # Convert to GRanges gr <- with(data, GRanges(chrom, IRanges(start + 1, end))) score(gr) <- data$value # Export to bedgraph export.bedGraph(gr, here::here("my_bedgraph.bedGraph"))
If we want to export to bigwig instead, we need to include the sequence lengths of every chromosome as well. The {rtracklayer} also has a convenient function for this.
# Export to bigwig gr <- with( data, GRanges( chrom, IRanges(start + 1, end), seqinfo = SeqinfoForUCSCGenome("hg19") ), ) score(gr) <- data$value export.bw(gr, here::here("my_bigwig.bw"))
Hi, this is so wonderful. There's a small issue, when I run
gr <- with(
data,
GRanges(
chrom,
IRanges(start + 1, end),
seqinfo = SeqinfoForUCSCGenome("hg38")
),
)
it said "Error in .normarg_seqnames2(seqnames, seqinfo) : 'seqnames' contains sequence names with no entries in 'seqinfo'". How to deal with this? Thanks!
It might be that your data has chromosome names that don't match the UCSC style for naming chromosomes. For example if your chromosome names are "1"
, "2"
, "3"
and not "chr1"
, "chr2"
, "chr3"
.
Alternatively, it might be that one of the scaffold names mismatch. Those are the chromosomes with the weird names like "chr22_KN196486v1_alt"
.
In any case, you can always manually construct a sequence information object manually if you have the lengths of chromosomes and chromosome names. You can do this by using the GenomeInfoDb::Seqinfo()
constructor function, and provide that result to the seqinfo
argument in the GRanges()
call.
It should be constructed such that all unique(data$chrom)
are chromosome names in the sequence information object.
It might be that your data has chromosome names that don't match the UCSC style for naming chromosomes. For example if your chromosome names are
"1"
,"2"
,"3"
and not"chr1"
,"chr2"
,"chr3"
.Alternatively, it might be that one of the scaffold names mismatch. Those are the chromosomes with the weird names like
"chr22_KN196486v1_alt"
.In any case, you can always manually construct a sequence information object manually if you have the lengths of chromosomes and chromosome names. You can do this by using the
GenomeInfoDb::Seqinfo()
constructor function, and provide that result to theseqinfo
argument in theGRanges()
call.It should be constructed such that all
unique(data$chrom)
are chromosome names in the sequence information object.
Yes, thanks! That's right. My chrom is '1' instead of 'chr1'.
As shown in the title, I would like to get a bedgraph file of insulation score. Thanks!