im3sanger / dndscv

dN/dS methods to quantify selection in cancer and somatic evolution
GNU General Public License v3.0
212 stars 48 forks source link

Error in RefCDS[[j]] : subscript out of bounds #55

Open CraigJAnderson opened 4 years ago

CraigJAnderson commented 4 years ago

Hi Inigo,

Firstly, thanks for dndscv- it's excellent. In trying to use your software with some alternative mouse strains I've come up against some undocumented issues with buildref that are worth mentioning.

The following error will occur with buildref if a given CDS isn't divisible by 3 and ALSO if there are N's in the reference: Error in RefCDS[[j]] : subscript out of bounds

buildref won't necessarily complain about genes it's thrown out, so it can be useful to check with a given gene set. To run dndscv for a given gene set, just feed the output of readLines("genes.txt") into the gene_list argument, where genes.txt is a list of gene IDs, one per line and it'll give you a list of offending genes.

When compounding Gene stable ID and Gene name with useids= T, the format is [Gene stable ID]:[Gene name] e.g. MGP_C3HHeJ_G0019206:Tmem235.

Hopefully this is useful to anyone else who trips up on these issues.

bhywong commented 3 years ago

Hi,

I got the same error when trying to buildref for mouse genome. Could you suggest how it can be fixed please?

[1/3] Preparing the environment...
[2/3] Building the RefCDS object...
Error in gene_split[[j]] : subscript out of bounds
im3sanger commented 3 years ago

First, thanks @CraigJAnderson for the useful tip for other users. I should issue a warning listing all genes excluded when running buildref. Until this is implemented I would recommend users to follow Craig's suggestion.

@bhywong: thanks for your interest in dndscv. I suspect that there may be a problem with your input transcript table and that buildref may be excluding all genes. Could you compare your table with that in the buildref tutorial to see if you can spot any obvious problems? You can also download a precomputed RefCDS for the GRCm38 mouse genome from this link. Let me know if this does not solve your problems.