vplagnol / ExomeDepth

ExomeDepth R package for the detection of copy number variants in exomes and gene panels using high throughput DNA sequencing data.
63 stars 26 forks source link

VCF output #36

Open fgvieira opened 3 years ago

fgvieira commented 3 years ago

Dear all,

is it possible to convert ExomeDepth output to a VCF? I looked in the manual, but there is not reference to a VCF...

thanks,

vplagnol commented 3 years ago

I have not implemented that, no. It does seem like a fairly manageable task I presume, probably cleaner than csv. Any favourite R VCF writer I should be using?

fgvieira commented 3 years ago

Not really... once used vcfR and it was fine but no real strong opinion.

fgvieira commented 3 years ago

Any idea of when it would be available? thanks,

jamigo commented 3 years ago

It would be great to have it implemented inside ExomeDepth's R code, but here's what we're using meanwhile:

REFERENCE=/path/to/reference.fa
for FILE in $FILENAME*cnv*tab; do
 SAMPLE=${FILE/.cnv.*}
 if [[ $FILE == *".cnv.X."* ]] && [ -f ${FILE/.cnv.X./.cnv.} ]; then HEADER=; else HEADER=1; fi
 if [ $HEADER ]; then echo '##fileformat=VCFv4.2
##source=ExomeDepth
##reference='$REFERENCE'
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=BF,Number=1,Type=Float,Description="Bayes Factor">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=EXONS,Number=.,Type=Integer,Description="Number of exons affected">
##INFO=<ID=READSEXP,Number=.,Type=Integer,Description="Number of reads expected">
##INFO=<ID=READSOBS,Number=.,Type=Integer,Description="Number of reads observed">
##INFO=<ID=READSRATIO,Number=.,Type=Integer,Description="Ratio expected vs. observed">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  '$SAMPLE > $SAMPLE.ExomeDepth.vcf
 fi
 tail -n +2 $FILE | sed 's/"//g' | perl -lane '
if ($F[2] =~ /dup/) { $F[2] = "DUP" } elsif ($F[2] =~ /del/) { $F[2] = "DEL" } else { $F[2] = "UNK" };
if ($F[11] > 0.5 && $F[11] < 1.5) { $gt = "0/1" } else { $gt = "1/1" }
$info = "END=$F[5];SVLEN=".($F[5]-$F[4]+1).";EXONS=$F[3];READSEXP=$F[9];READSOBS=$F[10];READSRATIO=$F[11];BF=$F[8];SVTYPE=$F[2]";
$F[8] = 0 if $F[8] < 0; print join "\t", $F[6], $F[4], ".", "N", "<$F[2]>", $F[8], "PASS", $info, "GT", $gt
' >> $SAMPLE.ExomeDepth.vcf
done
halessi commented 3 years ago

A vcf output to facilitate annotation by something like AnnotSV would be spectacular.

@jamigo, how does one use this?

edit: still confused. have tried calling your bash script after defining $FILENAME as the .csv file, but that doesn't seem appropriate and doesn't work.

jamigo commented 3 years ago

We internally name all ExomeDepth's output file as sample.cnv.tab and sample.cnv.X.tab (this is not mandatory, although if your output files are .csv you should modify the script accordingly), and this script merges both .tab pairs into one single .vcf file

halessi commented 3 years ago

@jamigo thank you so much. I am trying to use your script after reformatting the csvs as .tab, however I am getting this error:

syntax error at -e line 4, near "1)" Execution of -e aborted due to compilation errors. syntax error at -e line 4, near "1)" Execution of -e aborted due to compilation errors.

Your help would be greatly appreciated thank you so much. I am just trying to do it exactly as you did -- my capabilities with bash are pretty limited.

EDIT: fixed it. I think the script you posted here may be missing a parenthesis @ SVLEN=".$F[5]-$F[4]+1)."

(it worked as SVLEN="(.$F[5]-$F[4]+1).")

jamigo commented 3 years ago

That line where the INFO column is defined should in fact be the following: $info = "END=$F[5];SVLEN=".($F[5]-$F[4]+1).";EXONS=$F[3];READSEXP=$F[9];READSOBS=$F[10];READSRATIO=$F[11];BF=$F[8];SVTYPE=$F[2]"; I don't know why that left parenthesis was lost, but it should be right after the dot rather than before.

kasia-te commented 2 years ago

Hi,

I am trying to convert the ExomeDepth result (result@CNV.calls) to VCF inside my R script. Could you tell me if the variant coordinates are 0 or 1-based? If I understand the above bash script, it assumes that variant start (s) and end (e) positions are 0-based and inclusive, (like on the picture). Is it right? - - - + + + + + + - - - - - - s . . . . e - - -

Best, Kasia