Open fgvieira opened 3 years ago
I have not implemented that, no. It does seem like a fairly manageable task I presume, probably cleaner than csv. Any favourite R VCF writer I should be using?
Not really... once used vcfR
and it was fine but no real strong opinion.
Any idea of when it would be available? thanks,
It would be great to have it implemented inside ExomeDepth's R code, but here's what we're using meanwhile:
REFERENCE=/path/to/reference.fa
for FILE in $FILENAME*cnv*tab; do
SAMPLE=${FILE/.cnv.*}
if [[ $FILE == *".cnv.X."* ]] && [ -f ${FILE/.cnv.X./.cnv.} ]; then HEADER=; else HEADER=1; fi
if [ $HEADER ]; then echo '##fileformat=VCFv4.2
##source=ExomeDepth
##reference='$REFERENCE'
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=BF,Number=1,Type=Float,Description="Bayes Factor">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=EXONS,Number=.,Type=Integer,Description="Number of exons affected">
##INFO=<ID=READSEXP,Number=.,Type=Integer,Description="Number of reads expected">
##INFO=<ID=READSOBS,Number=.,Type=Integer,Description="Number of reads observed">
##INFO=<ID=READSRATIO,Number=.,Type=Integer,Description="Ratio expected vs. observed">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT '$SAMPLE > $SAMPLE.ExomeDepth.vcf
fi
tail -n +2 $FILE | sed 's/"//g' | perl -lane '
if ($F[2] =~ /dup/) { $F[2] = "DUP" } elsif ($F[2] =~ /del/) { $F[2] = "DEL" } else { $F[2] = "UNK" };
if ($F[11] > 0.5 && $F[11] < 1.5) { $gt = "0/1" } else { $gt = "1/1" }
$info = "END=$F[5];SVLEN=".($F[5]-$F[4]+1).";EXONS=$F[3];READSEXP=$F[9];READSOBS=$F[10];READSRATIO=$F[11];BF=$F[8];SVTYPE=$F[2]";
$F[8] = 0 if $F[8] < 0; print join "\t", $F[6], $F[4], ".", "N", "<$F[2]>", $F[8], "PASS", $info, "GT", $gt
' >> $SAMPLE.ExomeDepth.vcf
done
A vcf output to facilitate annotation by something like AnnotSV would be spectacular.
@jamigo, how does one use this?
edit: still confused. have tried calling your bash script after defining $FILENAME as the .csv file, but that doesn't seem appropriate and doesn't work.
We internally name all ExomeDepth's output file as sample.cnv.tab and sample.cnv.X.tab (this is not mandatory, although if your output files are .csv you should modify the script accordingly), and this script merges both .tab pairs into one single .vcf file
@jamigo thank you so much. I am trying to use your script after reformatting the csvs as .tab, however I am getting this error:
syntax error at -e line 4, near "1)" Execution of -e aborted due to compilation errors. syntax error at -e line 4, near "1)" Execution of -e aborted due to compilation errors.
Your help would be greatly appreciated thank you so much. I am just trying to do it exactly as you did -- my capabilities with bash are pretty limited.
EDIT: fixed it. I think the script you posted here may be missing a parenthesis @ SVLEN=".$F[5]-$F[4]+1)."
(it worked as SVLEN="(.$F[5]-$F[4]+1)."
)
That line where the INFO column is defined should in fact be the following:
$info = "END=$F[5];SVLEN=".($F[5]-$F[4]+1).";EXONS=$F[3];READSEXP=$F[9];READSOBS=$F[10];READSRATIO=$F[11];BF=$F[8];SVTYPE=$F[2]";
I don't know why that left parenthesis was lost, but it should be right after the dot rather than before.
Hi,
I am trying to convert the ExomeDepth result (result@CNV.calls
) to VCF inside my R script.
Could you tell me if the variant coordinates are 0 or 1-based?
If I understand the above bash script, it assumes that variant start (s) and end (e) positions are 0-based and inclusive, (like on the picture). Is it right?
- - - + + + + + + - - -
- - - s . . . . e - - -
Best, Kasia
Dear all,
is it possible to convert
ExomeDepth
output to a VCF? I looked in the manual, but there is not reference to a VCF...thanks,