kidsneuro-lab / RNA_splice_tool

0 stars 1 forks source link

Error in set(x, j = name, value = value) #1

Closed ChiaraF32 closed 1 year ago

ChiaraF32 commented 1 year ago

Hi Rhett,

I managed to successfully install cortar by first installing R version 4.1.3 compiled for Intel (rather than Arm64, which wouldn't work - I have Apple Silicon). Just for reference, I had to use Biomanager to install the following dependencies, prior to running your installation script: devtools::install_github("kidsneuro-lab/RNA_splice_tool") This was to prevent an error message suggesting the dependencies were not available for cortar.

#install dependencies
if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install("GenomicRanges")
BiocManager::install("GenomicFeatures")
BiocManager::install("GenomicAlignments")
BiocManager::install("BSgenome.Hsapiens.UCSC.hg38")
BiocManager::install("BSgenome.Hsapiens.UCSC.hg19")
BiocManager::install("BSgenome.Hsapiens.1000genomes.hs37d5")

I am now having issues with running cortar.

I made a sample annotation file, which I have uploaded for reference.

I then ran the following:

cortar(
  file = "./sample.tsv",
  mode = "research",
  assembly = "hg38",
  annotation = "UCSC",
  paired = TRUE,
  stranded = 2,
  output_dir = "./output",
  genelist = c("TRDN")
)

Which returned the following output and error message:

Running cortar 
        file: ./sample.tsv
        mode: research
    assembly: hg38
  annotation: UCSC
      paired: TRUE
    stranded: 2
      output: ./output

Selecting genes and transcripts...

Extracting and counting reads...
    TAM_M6932
[W::hts_idx_load2] The index file is older than the data file: /Users/00104561/Library/CloudStorage/OneDrive-TheUniversityofWesternAustralia/PERKINS/RNAseq/cortar/TAM-M6932.markdup.sorted.bam.bai
    IVCT_53Y_M
[W::hts_idx_load2] The index file is older than the data file: /Users/00104561/Library/CloudStorage/OneDrive-TheUniversityofWesternAustralia/PERKINS/RNAseq/cortar/IVCT-53Y-M.markdup.sorted.bam.bai

Annotating and quantifying events...
Error in set(x, j = name, value = value) : 
  Supplied 10 items to be assigned to 11 items of column 'event'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
In addition: Warning messages:
1: In .infer_intron_strand(unoriented_intron_motif) :
  For some junctions, the dinucleotides found at the intron boundaries don't
  match any of the natural intron motifs stored in predefined character vector
  'NATURAL_INTRON_MOTIFS'. For these junctions, the intron_motif and
  intron_strand metadata columns were set to NA and *, respectively.
2: In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode,  :
    3 alignments with ambiguous pairing were dumped.
    Use 'getDumpedAlignments()' to retrieve them from the dump environment.
3: In .infer_intron_strand(unoriented_intron_motif) :
  For some junctions, the dinucleotides found at the intron boundaries don't
  match any of the natural intron motifs stored in predefined character vector
  'NATURAL_INTRON_MOTIFS'. For these junctions, the intron_motif and
  intron_strand metadata columns were set to NA and *, respectively.

If you could help me solve this issue that would be great!

Thanks, Chiara :) samples.txt

rhettmarchant commented 1 year ago

Hi Chiara,

Sorry for not replying to this!

Can you confirm that the assembly, annotation, paired and stranded parameters are all correct for your samples?

I have seen this error before but I don't recall exactly how I fixed it, I'd have to do some digging.

Thanks, Rhett

ChiaraF32 commented 1 year ago

Hi Rhett,

I used NCBI's GRCh38 FASTA and GTF files as genome references during alignment, available here, not UCSC or 1000 Genomes.

So I assume this is the issue. What would be my options for solving this without re-aligning?

Cheers, Chiara

rhettmarchant commented 1 year ago

Hi Chiara,

I can't say for sure that that is the issue here but I suspect it is. I've added a test branch to the repo which includes support for the NCBI_GRCh38 genome.

If you could run the following and let me know how it goes, that would be great.

remove.packages("cortar")

devtools::install_github("kidsneuro-lab/RNA_splice_tool@NCBI_GRCh38")

cortar(
  file = "./sample.tsv",
  mode = "research",
  assembly = "hg38",
  annotation = "NCBI",
  paired = TRUE,
  stranded = 2,
  output_dir = "./output",
  genelist = c("TRDN")
)

Hope this helps,

Rhett :)

rhettmarchant commented 1 year ago

Hi Chiara,

On second thoughts, the fix I have made is unlikely to work.

I'll work on fixing this problem today.

Thanks, Rhett

rhettmarchant commented 1 year ago

Hi Chiara,

Sorry for the delay,

Could you try the aforementioned code?

Thanks, Rhett

ChiaraF32 commented 1 year ago

Hi Rhett,

I ran the code and got the following output and error:

Running cortar 
        file: ./sample.tsv
        mode: research
    assembly: hg38
  annotation: NCBI
      paired: TRUE
    stranded: 2
      output: ./output

Selecting genes and transcripts...

Extracting and counting reads...
    TAM_M6932
[W::hts_idx_load2] The index file is older than the data file: /Users/00104561/Library/CloudStorage/OneDrive-TheUniversityofWesternAustralia/PERKINS/RNAseq/cortar/TAM-M6932.markdup.sorted.bam.bai
Error in .io_bam(.scan_bamfile, file, reverseComplement, yieldSize(file),  : 
  seqlevels(param) not in BAM header:
    seqlevels: ‘NC_000006.12’
    file: /Users/00104561/Library/CloudStorage/OneDrive-TheUniversityofWesternAustralia/PERKINS/RNAseq/cortar/TAM-M6932.markdup.sorted.bam
    index: /Users/00104561/Library/CloudStorage/OneDrive-TheUniversityofWesternAustralia/PERKINS/RNAseq/cortar/TAM-M6932.markdup.sorted.bam
rhettmarchant commented 1 year ago

Hi Chiara,

I think I've finally worked out the issue. I was able to recreate your error and have at least a temporary fix for this problem.

cortar was unable to process when a bamfile had a read that wasn't assigned a strand ( instead of + or -). The fix should let your samples run as normal and flag any -strand reads in the final report output.

Could you run this and see if it fixes the issue?

remove.packages("cortar")

devtools::install_github("kidsneuro-lab/RNA_splice_tool@unknown-strand-error")

cortar(
  file = "./sample.tsv",
  mode = "research",
  assembly = "hg38",
  annotation = "UCSC",
  paired = TRUE,
  stranded = 2,
  output_dir = "./output",
  genelist = c("TRDN")
)

Also going forward, just use UCSC as the annotation as it is only used to determine which chromosome name style to use for investigating the bamfile (e.g. "chr1" [UCSC/NCBI] vs "1" [1000genomes]). The coordinates should be consistent across NCBI_GRCh38 and UCSC_GRCh38 alignments.

Thanks for your patience, let me know how you go :)

Cheers, Rhett

ChiaraF32 commented 1 year ago

Hi Rhett,

Your solution worked! Thanks so much for your help, the outputs look great!

Cheers, Chiara