Open xiucz opened 6 years ago
hi @xiucz thanks for this report!
cnvkit outputs a 1-based copy number segment format from the documentation here: https://cnvkit.readthedocs.io/en/stable/fileformats.html
on the page you linked we run this to convert the 1-based coordinates from cnvkit to 0-based to match the bed file
tail -n +2 WGS_Tumor_merged_sorted_mrkdup_bqsr.cns | awk '{print $1"\t"$2-1"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}' > WGS_Tumor_merged_sorted_mrkdup_bqsr.2.cns
So at this point WGS_Tumor_merged_sorted_mrkdup_bqsr.cns remains 1-based but WGS_Tumor_merged_sorted_mrkdup_bqsr.2.cns is now 0-based
I often refer to this biostarts post when doing these coordinate conversions https://www.biostars.org/p/84686/
we the run bedtools intersect on the 0-based bed file and the 0-based segment file.
bedtools intersect -wa -wb -b /workspace/inputs/references/transcriptome/gene_annotation.bed -a WGS_Tumor_merged_sorted_mrkdup_bqsr.2.cns > WGS_Tumor_merged_sorted_mrkdup_bqsr.2.annotated.cns
so at this point bedtools intersect is working on two 0-based files so everything I think should be fine
Let me know if you disagree or if i've misunderstood the issue you've presented
Hi,
we the run bedtools intersect on the 0-based bed file and the 0-based segment file.
This step, I agree with you, and the result file 2.annotated.cns
is still 0-based. So if we want to use the result file to go on other analysis, is it better to convert it to 1-based?
And I have one more suggestion, rename ".2.annotated.cns" to ".annotated.bed", this will be more clearly to know the coordination system of the file for newers.
Thank you.
Hi, In this part, it writes
I know that bed file is 0-based but cns file is also 0-based(mimused by 1). But it seems that we should plus 1 to the start of every recode in the result cns file? Because the CNS format is 1-based.
Thanks for your reply.