statgen / EPACTS

GNU General Public License v3.0
34 stars 20 forks source link

EPACTS single --interval-list #22

Open ana-stankovic opened 4 years ago

ana-stankovic commented 4 years ago

Hello, I am having trouble seting the --interval-list parameter for the epacts single function. In the documentation it is defined as "List of intervals as a unit to perform association in standard BED format (0-based-inclusive-start, 0-based-exclusive-end)", but all the string formats I have tried have resulted in the same error: Can't locate object method "new" via package "FileHandle" (perhaps you forgot to load "FileHandle"?) at /usr/local/bin/epacts.pm line 653, <PED> line 659. I have tried adding a BED file as well, and the same error persists. Without this parameter, theepacts singlefinishes without any problems. The files I am using are the EPACTS test files and EPACTS version is v3.4.2.

Thank you for the help!

ana-stankovic commented 4 years ago

After including the FileHandle package in the epacts.pm script a new error is shown: ERROR: Interval size mismatch I have tried with different BED files, for example setting it for only chromosome 20 or the chromosomes 1..22, X,Y (and with or without MT), and setting it for different region sizes - whole chromosomes or a region that encompasses the variants in the test VCF. I have also tried running with --chr parameter set to 20, and also running it without it.

I have found that the intervalsByBED function always returns -1, so the same error is always shown.

This is the command line:

epacts single --vcf 1000G_exome_chr20_example_softFiltered_grch38.calls.vcf.gz --ped 1000G_dummy_pheno.ped --out intevals_test --test b.wald --pheno DISEASE  --cov AGE --cov SEX  --min-maf 0.001 --run 1 --ref GRCh38ERCC.ensembl95.fasta.gz --interval-list chr20.bed

Can you tell me what regions BED file should cover and in what exact format it should be in?

Thank you

jonathonl commented 4 years ago

Can you provide the first few records in your BED file?

ana-stankovic commented 4 years ago

Of course. I have tried with just chromosome 20: 20 0 64444167 and all the chrs as well:

1   0   248956422
2   0   242193529
3   0   198295559
4   0   190214555

As I wrote - I have tried several options just to get it to run. Here BED covers the entire chromosomes - the sizes are compatible with the reference FASTA I am using.

vladimirkovacevic commented 4 years ago

Hi @jonathonl! @ana-stankovic and I are working on wrapping EPACTS tools in CWL to make them available in Data Stage and other Seven Bridges platforms. This issue is one of the last ones that keep us from finalizing the publishing process. Here is the docker image that Ana created, it might be helpful to you during debug.

jonathonl commented 4 years ago

I suspect you need to use the chr prefix in the BED file (eg, chr1 <beg> <end>) since you are using build 38. If that doesn't fix it, I can look into it further. Also, it looks like you are using the entire chromosome length as an interval. I don't see the benefit in doing this. Is there a particular use case for which you are incorporating this option?

ana-stankovic commented 4 years ago

Yes, I have tried with the chr prefix as well, and the same message appears. I do not have a specific use-case, I am testing the usage of this parameter. I have tried several BED files, with smaller or larger regions, making sure that the interval captures the variants in the test VCF. However it always returns ERROR: Interval size mismatch. BED file with the whole chromosomes that match the FASTA reference was one of the tests that I ran, with the premise that this interval should not return this message.

jonathonl commented 4 years ago

Ok, I'll look into this further. It will likely take a few weeks for me to get to it though. Are you able to exclude options like this from your CWL workflow? I'm not sure how often it would actually be used.

vladimirkovacevic commented 4 years ago

@jonathonl, yes, we'll exclude it for now. Thank you for looking into this.