nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
399 stars 404 forks source link

ASCAT executing R files #125

Closed ggabernet closed 4 years ago

ggabernet commented 4 years ago

Hi, when running Sarek with multiple variant callers, it seems like the first one is picked and the rest are ignored. I run it indicating Strelka and ASCAT, and ASCAT was just ignored.


nextflow run ggabernet/nf-core-sarek -r v2.5.2-branch \
--outdir 's3://qbic-bucket-virginia/resultsdirsarekicgc1' \
-w 's3://qbic-bucket-virginia/workdirsarekicgc1' \
--tracedir 's3://qbic-bucket-virginia/tracesarekicgc1' \
--input 's3://qbic-bucket-virginia/icgc-sarek/input-icgc-1.tsv' \
--genome 'GRCh38' \
--tools 'Strelka,ASCAT,snpEff' \
-c awsbatch.config \
--awsregion 'us-east-1' \
--igenomes_base 's3://qbic-bucket-virginia/references' \
--awscli '/home/ec2-user/miniconda/bin/aws' -resume
maxulysse commented 4 years ago

I'll look at it right away.

ggabernet commented 4 years ago

Hi Maxime, sorry my bad. The convertAlleleCounts process failed and that is why ASCAT was not triggered

ggabernet commented 4 years ago

so that's not the issue

maxulysse commented 4 years ago

OK, good to know, any idea what the problem was?

ggabernet commented 4 years ago

Fatal error: cannot open file '/home/ec2-user/.nextflow/assets/ggabernet/nf-core-sarek/bin/convertAlleleCounts.r': No such file or directory

ggabernet commented 4 years ago

I had to make a fork to fix the SamToFastq issue, but all the rest of the code is the same as the 2.5.2 release

maxulysse commented 4 years ago

Did it worked before? Or do you think that it was already an issue? Maybe we shouldn't use that to call the R script:

Rscript ${workflow.projectDir}/bin/convertAlleleCounts.r ...
ggabernet commented 4 years ago

yes that could be the issue, shouldn't it work directly with convertAlleleCounts.r as the bin is added to the path?

ggabernet commented 4 years ago

I can test it out and let you know

maxulysse commented 4 years ago

I'll try it out on our cluster as well.

maxulysse commented 4 years ago

By the way, if you're using ASCAT, the current dev has some good improvement. You can now specify purity an ploidy

ggabernet commented 4 years ago

I've tried Rscript convertAlleleCounts.r and directly convertAlleleCounts.r (as you have the shebang Rscript line. Nothing works:

.command.sh: //nextflow-bin/convertAlleleCounts.r: /bin/env: bad interpreter: No such file or directory

It's a bit weird as it worked for me in Bcellmagic like the last option

ggabernet commented 4 years ago

Ah I just saw the shebang line was missing /usr/, I try with this now

maxulysse commented 4 years ago

it also seems that there's a typo in the shebang for run_ascat.R as well

ggabernet commented 4 years ago

yes, I fixed both now, let's see

maxulysse commented 4 years ago

You're trying on AWS?

ggabernet commented 4 years ago

yes, I have to set it up there so we can run ASCAT on ICGC data

ggabernet commented 4 years ago

looks good now, the job was immediately killed before. But to make sure I'll post it when it runs through

maxulysse commented 4 years ago

Good, I made the same changes, and I'm trying it out on our server. You can make a PR, and if it works for everyone we can merge

ggabernet commented 4 years ago

perfect, will do!

ggabernet commented 4 years ago

This is solved now, but I am having another issue with ASCAT, this time I switched already to the dev branch as suggested:

[1] Reading Tumor LogR data...
[1] Reading Tumor BAF data...
[1] Reading Germline LogR data...
[1] Reading Germline BAF data...
[1] Registering SNP locations...
[1] Splitting genome in distinct chunks...
Error in names(x) <- value : 
  'names' attribute [2] must be the same length as the vector [1]
Calls: ascat.GCcorrect -> colnames<-
In addition: Warning message:
In read.table(file = GCcontentfile, header = TRUE, as.is = TRUE) :
  incomplete final line found by readTableHeader on 'input.5'
Execution halted

I love that R does not print the line number in errors...

ggabernet commented 4 years ago

skipping ascat.GCcorrect works, there must be a problem in the GCcontentfile, but it's super hard to debug on AWS, will try on the cluster tomorrow

maxulysse commented 4 years ago

No problem executing R as we planned. But I had an issue with the GC file that wasn't recognized. I'll try to fix that.

maxulysse commented 4 years ago

OK, I found why you're having currently this bug with #127 I made a mistake with #107 and forgot to snake case fully the params for the ascat gc file in conf/igenomes.config. Since you already have a PR open open, I'll let you correct ac_lociGC to ac_loci_gc.

ggabernet commented 4 years ago

great, let's hope this solves the issues