rcastelo / gDNAx

Diagnostics for assessing genomic DNA contamination in RNA-seq data
1 stars 0 forks source link

Zero intergenic alignments in bulk RNA-seq #4

Open rcastelo opened 2 days ago

rcastelo commented 2 days ago

I'm also getting a similar error for bulk-RNA data,

> bam_files
                        MCAO1                         MCAO3                         MCAO4                        Shame1 
 "../sortbam//MCAO1.sort.bam"  "../sortbam//MCAO3.sort.bam"  "../sortbam//MCAO4.sort.bam" "../sortbam//Shame1.sort.bam" 
                       Shame3                        Shame4 
"../sortbam//Shame3.sort.bam" "../sortbam//Shame4.sort.bam" 
> class(txdb)
[1] "TxDb"
attr(,"package")
[1] "GenomicFeatures"
> gdnax <- gDNAdx(bam_files, txdb=txdb,verbose = F,strandMode = NA,singleEnd=F)
Error in .Hub_get1(x[i], force = force, verbose = verbose) : 
  no records found for the given index

but when check&dubug gDNAdx function source code,i found that this error caused by gDNAx:::.fetchIGCandINTrng,So I suspect that the TxDb I generated with makeTxDbFromGFF contains incomplete,then I try to add other parameters to skip the relevant analysis,for example:useRMSK = F.

gdnax <- gDNAdx(bam_files, txdb=txdb,verbose = F,strandMode = NA,singleEnd=F,exonsBy = "gene",useRMSK = F)

it worked,but I doubt the accuracy of the calculations,IGC are all 0.

image

Are there any other better suggestions or more gtf TxDb based tests?

Originally posted by @BioLaoXu in https://github.com/rcastelo/gDNAx/issues/3#issuecomment-2456195681

rcastelo commented 2 days ago

Dear @BioLaoXu I have moved your comment to a new issue. It's important to avoid mixing different problems in the same issue. I would need a bit more information to be able to identify the problem, concretely, the version of the genome on which your RNA-seq reads were aligned and if possible, I would need to access the GTF or GFF file that you are using to build the TxDb object.

BioLaoXu commented 12 hours ago

@rcastelo thank you for your reply, my reference genome and GTF are both emsembl data and are not UCSC databases, detailed species and version: Mus_musculus.GRCm38.94,sorry that the file is too large, it cannot be upload。

By the way, according to the description of the article(CleanUpRNAseq), gDNAx does not remove exon region reads , if this is true, then gDNAtx should not be considered true gDNA contamination removal,is this true?