im3sanger / dndscv

dN/dS methods to quantify selection in cancer and somatic evolution
GNU General Public License v3.0
212 stars 48 forks source link

Error when trying to use buildref() : Error in new_codon[1, ]: incorrect number of dimensions #85

Closed tly0505 closed 1 year ago

tly0505 commented 1 year ago

Hello,

Thanks very much for the software. I seem to be having an issue when trying to build my own reference (in this case for mm9 strain) with buidref(). I get the following output:

[1/3] Preparing the environment...

[2/3] Building the RefCDS object...

[3/3] Calculating the impact of all possible coding changes...

Error in new_codon[1, ]: incorrect number of dimensions
Traceback:

1. buildref(cdsfile = transr_path, genomefile = ref_fa_path, outfile = "tes_chr1.rda", 
 .     excludechrs = "MT")
2. paste(new_codon[1, ], new_codon[2, ], new_codon[3, ], sep = "")
3. standardGeneric("paste")
4. eval(quote(list(...)), env)
5. eval(quote(list(...)), env)
6. eval(quote(list(...)), env)

I've subset everything to just one chromosome, and found two transcripts that seem to be causing the issue, but I can't quite tell how... This is one of them:

Ensembl Gene ID Associated Gene Name Ensembl Protein ID Chromosome Name Exon Chr Start (bp) Exon Chr End (bp) CDS Start CDS End CDS Length Strand

ENSMUSG00000038594 AC153524.3-1 ENSMUSP00000093356 10 53068886 53068936 1 51 1344 -1 ENSMUSG00000038594 AC153524.3-1 ENSMUSP00000093356 10 53068278 53068883 52 657 1344 -1 ENSMUSG00000038594 AC153524.3-1 ENSMUSP00000093356 10 53039404 53039525 658 779 1344 -1 ENSMUSG00000038594 AC153524.3-1 ENSMUSP00000093356 10 53032314 53032431 780 897 1344 -1 ENSMUSG00000038594 AC153524.3-1 ENSMUSP00000093356 10 53021308 53021487 898 1077 1344 -1 ENSMUSG00000038594 AC153524.3-1 ENSMUSP00000093356 10 53017859 53018011 1078 1230 1344 -1 ENSMUSG00000038594 AC153524.3-1 ENSMUSP00000093356 10 53016060 53016174 1231 1344 1344 -1 By trial and error I figured that the buildref() will run just fine if I remove either first or the last exon, and also if I alter the coordinates of the first or last exon in any way (either exon start/end or CDS start/end). Any help would be much appreciated, thank you!
tly0505 commented 1 year ago

I've fixed this now - issue was that I had Exon Start and End coordinated (rather than coding region start/end, those did not seem to be available for mm9 from Biomart), that also contained UTRs that sometimes were not divisible by 3. It does not seem like the spans of CDS start/end and Coding Region start/end are checked by the software, which lead to the issue with codon extraction.