suchestoncampbelllab / gwasurvivr

GWAS Survival Package in R
11 stars 12 forks source link

Error in genotypes[!blankSNPs, cox.params$ids] : subscript out of bounds #4

Closed hamishinnes closed 4 years ago

hamishinnes commented 5 years ago

Hi im trying to use gwasurvivr to run a survival analysis on plink data files.

the code im using is as follows:

plinkCoxSurv(bed.file="/users/ptb17163/lustre/EgaDemoClient_2.2.2/plink/top20variants.bed", covariate.file=df, covariates=c("age"), id.column="eid", time.to.event="X_t", event="fail", out.file="test",inter.term = NULL, print.covs = "only", chunk.size = 10000, verbose = TRUE, clusterObj = NULL )

however, when i run this, i get the following error message:

Error in genotypes[!blankSNPs, cox.params$ids] : subscript out of bounds

do you have any ideas as to what may be causing this - and how i can resolve it?

aarizvi commented 5 years ago

@hamishinnes -- can you verify that you your sample IDs match between your genotype columns and the covariate files sample IDs?

Are your sample IDs in your covariate file class character?

hamishinnes commented 5 years ago

problem solved!! my sample ID variable in my covariate was in a numeric format. I just converted it into character format, and it is now working perfectly. Thank you so much!!!! really great

hamishinnes commented 5 years ago

sorry that should have read... my sample ID variable in my covariate file was in a numeric format. I just converted it into character format, and it is now working perfectly. Thank you so much!!!!

aarizvi commented 5 years ago

Excellent. Let us know if you have any other issues. Closing this issue.

karaesmen commented 5 years ago

Great to hear! But we need to add a more sensible error message for this, I'd keep this issue open.

syy525 commented 4 years ago

Hi, I'm trying to use gwasurvivr to run a survival analysis on impute2 data files.

The code I'm using is as follows:

impute2CoxSurv(impute.file=impute.file, sample.file=sample.file, chr=1, covariate.file=covariate.file, id.column="ID_1", sample.ids=sample.ids, time.to.event="timetoCRPC", event="Event1", covariates=c("age"), inter.term=NULL, print.covs="only", out.file="C:/Users/tomato/Desktop/chr1", chunk.size=100, maf.filter=0.005, exclude.snps=NULL, flip.dosage=TRUE, verbose=TRUE, clusterObj=NULL, keepGDS=FALSE)

However, when I run this, I got the following error message: * Compression time ** User:232.41 System: 18.85 Elapsed: 280.87


Analyzing part 1/17... Error in genotypes[, cox.params$ids] : subscript out of bounds

I read this article and I am sure my ID_1 and sample.ids are character; timetoCRPC, Event1, age are numbers. There is one possible reason that there are a lot of blanks (NA) in my covariate.file. I tried to fill them with numbers or NA or keep blanks, but the error is the same.

Do you have any ideas as to what may be causing this - and how can I resolve it?

syy525 commented 4 years ago

Hi, I'm trying to use gwasurvivr to run a survival analysis on impute2 data files.

The code I'm using is as follows:

impute2CoxSurv(impute.file=impute.file, sample.file=sample.file, chr=1, covariate.file=covariate.file, id.column="ID_1", sample.ids=sample.ids, time.to.event="timetoCRPC", event="Event1", covariates=c("age"), inter.term=NULL, print.covs="only", out.file="C:/Users/tomato/Desktop/chr1", chunk.size=100, maf.filter=0.005, exclude.snps=NULL, flip.dosage=TRUE, verbose=TRUE, clusterObj=NULL, keepGDS=FALSE)

However, when I run this, I got the following error message: * Compression time ** User:232.41 System: 18.85 Elapsed: 280.87

Analyzing part 1/17... Error in genotypes[, cox.params$ids] : subscript out of bounds

I read this article and I am sure my ID_1 and sample.ids are character; timetoCRPC, Event1, age are numbers. There is one possible reason that there are a lot of blanks (NA) in my covariate.file. I tried to fill them with numbers or NA or keep blanks, but the error is the same.

Do you have any ideas as to what may be causing this - and how can I resolve it?

I would like to add that, I tried the example from your website: n100_p1000_chr18 (.impute), n100.impute_sample (.txt), simulated_pheno (.txt) and met the same errors.

However, when I changed the n100.impute_sample (.txt) to impute_example.IMPUTE2_SAMPLE, it could run perfectly.

I wonder if the problem is due to sample file could not be .txt!!? And it should be .sample or .IMPUTE2_SAMPLE.

aarizvi commented 4 years ago

Very interesting indeed. Those files should be fine as .txt or any extension as long as they are tab separated.

And although you said it's not the case -- the error from your first post -- looks like a sampleID issue. Can you do class() on that column and paste it here?

syy525 commented 4 years ago

Very interesting indeed. Those files should be fine as .txt or any extension as long as they are tab separated.

And although you said it's not the case -- the error from your first post -- looks like a sampleID issue. Can you do class() on that column and paste it here?

Below is the class() data:

class(sample.ids) [1] "character" class(covariate.file$ID_1) [1] "character" class(covariate.file$ID_2) [1] "character" class(covariate.file$Event1) [1] "numeric" class(covariate.file$timetoCRPC) [1] "numeric" class(covariate.file$age) [1] "numeric"

By the way, there are 1037 objects in my study and 422 objects were picked up in sample.ids. And the error showed: Analyzing part 1/1613... Error in genotypes[, cox.params$ids] : subscript out of bounds

Why there are part 1/1613 in my study? Should it be 1/422?

Many thanks.

aarizvi commented 4 years ago

Because these data are large and R may be overloaded by very large matrices, we chunk up the genotype matrix. Thats why you have 1613 parts. It doesnt have to do with your sampleIDs.

The next step I would suggest is to actually load in the genotype data into R (perhaps maybe like the first 5 rows) and confirm that the column headers of your genotype matrix are the same names as your sample IDs.

Something like ...

table(colnames(genotype_matrix) %in% covariate.file$ID_1)

syy525 commented 4 years ago

table(colnames(genotype_matrix) %in% covariate.file$ID_1)

My genotype file is from IMPUTE2, there is no header on it. So the code "table(colnames(genotype_matrix) %in% covariate.file$ID_1)" showed FALSE 3116

When my "genotype file" was imported into R using read.table, It look like: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13.... --- 1:10177:A:AC 10177 A AC 0.379 0.482 0.138 0.208 0.651 0.141 0.672 0.303 ...

The sample.file look like: ID_1 ID_2 missing sex 1 0 0 0 D 2 2 38509_2 0 1 3 3 38201_2 0 1 4 4 38728_2 0 1 5 5 38371_2 0 1 6 6 38397_2 0 1

The covariate.file look like: ID_1 ID_2 Event1 timetoCRPC age Mstage GSabove7 1 2 38509_2 0 125 67 0 0 2 3 38201_2 0 125 67 0 0 3 4 38728_2 0 125 67 0 0 4 5 38371_2 0 125 67 0 0 5 6 38397_2 0 125 67 0 0 6 7 38414_2 0 125 67 0 0

Is there anything wrong with these?

aarizvi commented 4 years ago

Everything arguably looks good besides the genotype file.

You need to add a header ... to the genotype file. the first 7 columns should be something like

first_seven <- c("snpID", "TYPED", "RSID", "POS", "A0", "A1", "CHR"))

From column 8 on, the column names need to be the same as your sampleIDs. So in your case it would be colnames(genotypes) <- c(first_seven, 1:(ncol(genotypes)-7))

However, I would suggest changing your sample names to something likepaste0("samp", 1:nrow(sample.file)). But keep in mind, you'd have to follow the same schema for sample.file, covariate.file, and genotype.

And the reason for changing the names -- is so you have characters are column names for the genotype file instead of numerics. It makes it less prone to error. But should work either way.

syy525 commented 4 years ago

Interestingly, my team found the problem. The error was not due to genotype file of mine, which is the same format as the "gwasurvivr example: n100_p1000_chr18.impute". It really do not need header. snp_0 18-221 221 C T 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0.......(n100_p1000_chr18.impute)

The problem is still due to ""sample file". my sample.file look like: ID_1 ID_2 missing sex 0 0 0 D 2 38509_2 0 1 3 38201_2 0 1 4 38728_2 0 1 5 38371_2 0 1 6 38397_2 0 1

The ID_1 is different from ID_2. When we made them the same, it could run perfectly, ID_1 ID_2 missing sex 0 0 0 D 2 2 0 1 3 3 0 1 4 4 0 1 5 5 0 1 6 6 0 1

Go back to the script, we guest it might be due to the effect of this command which using ID_2.

and sample ID as columns to genotype file

    dimnames(genotypes) <- list(paste(snp$TYPED, snp$RSID, sep=";"), 
                                scanAnn$ID_2)

Thank you for all of your reply, this is a great software to run GWAS survival!!!!

Qiaolan commented 4 years ago

Interestingly, my team found the problem. The error was not due to genotype file of mine, which is the same format as the "gwasurvivr example: n100_p1000_chr18.impute". It really do not need header. snp_0 18-221 221 C T 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0.......(n100_p1000_chr18.impute)

The problem is still due to ""sample file". my sample.file look like: ID_1 ID_2 missing sex 0 0 0 D 2 38509_2 0 1 3 38201_2 0 1 4 38728_2 0 1 5 38371_2 0 1 6 38397_2 0 1

The ID_1 is different from ID_2. When we made them the same, it could run perfectly, ID_1 ID_2 missing sex 0 0 0 D 2 2 0 1 3 3 0 1 4 4 0 1 5 5 0 1 6 6 0 1

Go back to the script, we guest it might be due to the effect of this command which using ID_2.

and sample ID as columns to genotype file

dimnames(genotypes) <- list(paste(snp$TYPED, snp$RSID, sep=";"), scanAnn$ID_2)

Thank you for all of your reply, this is a great software to run GWAS survival!!!!

The same issue happened to me. FID did not work, but IID worked well.

tesselgalesloot commented 4 years ago

.