ryanlayer / samplot

Plot structural variant signals from many BAMs and CRAMs
MIT License
533 stars 67 forks source link

Issues with using the samplot vcf module #187

Open karen916 opened 1 year ago

karen916 commented 1 year ago

Dear Samplot Development Team,

I would like to express my gratitude for developing such a valuable tool like Samplot. Your efforts have significantly contributed to the research community, and I truly appreciate the hard work you've put into this software.Following that, I've encountered some issues while using the Samplot vcf module that I'd like to discuss with you. I hope you can provide some guidance or solutions to address these concerns. When running the samplot vcf module, there were no errors, but no images were generated. Additionally, when opening the HTML file, it displays 'No data available in table'.My SVs file was generated from 2nd generation 10x resequencing data, processed through Lumpy, Manta, and Delly, and then merged using SURVIVOR. Initially, I executed the following command: `(py37) [chenzhaojin@lfpara survivor]$ samplot vcf \

--filter "SVTYPE == ‘DEL’ & SU >= 5" \ --filter "SVTYPE == 'INV' & SU >= 3" \ --vcf filtered_1-30_fixed.vcf \ -d test_1/ \ -O png \ --important_regions bed2.bed\ -b /home/chenzhaojin/expansion/sv/mid75/1-30/delly/alignment_1_sorted.bam \ --sample_ids sample1-30 > samplot_command.sh (py37) [chenzhaojin@lfpara survivor]$ cd test_1/ (py37) [chenzhaojin@lfpara test_1]$ ll total 1 -rw-r--r-- 1 chenzhaojin lfpara 24507 Oct 9 11:15 index.html Subsequently, I reviewed the filtering criteria and attempted to modify the conditions.I discovered that there is an issue with the 'SU' in my VCF file,because my second command produced no output. (py37) [chenzhaojin@lfpara survivor]$ grep -v "^#" filtered_1-30_fixed.vcf | grep "SVTYPE=DEL" | wc -l 747 (py37) [chenzhaojin@lfpara survivor]$ grep -v "^#" filtered_1-30_fixed.vcf | grep "SVTYPE=DEL" | awk -F';' '{for(i=1;i<=NF;i++) if ($i ~ /SU=/) print $i}' | sort | uniq -c (py37) [chenzhaojin@lfpara survivor]$ grep -v "^#" filtered_1-30_fixed.vcf | grep "SVTYPE=DEL" | head -1 NC_010443.5 5537541 MantaDEL:640:0:1:0:0:0 CTTGGATTCCGTCGTGGCAGTGTAACAATCGATAGACATGAGGTTGCGGGTTCGATCTGCCTTGCTCATGGTTAACGATCCGCATGGCGTGAGCTGTGGTAGGTGCAGACGCGGCTCGATCCGAGTTGCTGTGCTCTGGCGTAGGCAGTGCTACAGCTCCGATTCGACCCTAGCCTGGAACTCATATGCCGGAGCGCCAAAAATAGCAAAAAAAAAAAAATAAAAAAAATAACCTTCATACAGAAACTACTAAATAAAAATAGTTAAGACTACAAGTTCAGGAGTTCCCGTCGTGGCGCAGTGGTTAACGAATCCGACTAGAACATGAGGTTGCGGTTCGTCCTGCCTTGCTCAGTGGTTAAGACCGGCGTTGCGTGAGCTGTGGTGTAGGTGCAGACGCGGCTCGGATTCCGCGTTGCTGTGGCTCTGGCGAGGCGTGATACAGCTCGATTCAACCCTAGCTGGAACTCCATATGCGCGGAGCGCCAAGAAATAGCAACAATAACAACAACAGAAAGACAAAAAAAAAAAAAAA CAGAAACTACTAAATAAAAATAGTT 551 PASS SUPP=3;SUPP_VEC=111;SVLEN=-449;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.7;CHR2=NC_010443.5;END=5538075;CIPOS=-1,279;CIEND=-535,10;STRANDS=+- GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO ./.:NA:279:0,7:+-:.,.,.,.:DEL,INV,INV,DEL:28:NA:NA:NC_010443.5_5537541-NC_010443.5_5537765,NC_010443.5_5537540-NC_010443.5_5537819,NC_010443.5_5537819-NC_010443.5_5537540,NC_010443.5_5537820-NC_010443.5_5538075 1/1:NA:534:0,19:+-:551,59:DEL,DEL:MantaDEL_640_0_1_0_0_0:CTTGGATTCCGTCGTGGCAGTGTAACAATCGATAGACATGAGGTTGCGGGTTCGATCTGCCTTGCTCATGGTTAACGATCCGCATGGCGTGAGCTGTGGTAGGTGCAGACGCGGCTCGATCCGAGTTGCTGTGCTCTGGCGTAGGCAGTGCTACAGCTCCGATTCGACCCTAGCCTGGAACTCATATGCCGGAGCGCCAAAAATAGCAAAAAAAAAAAAATAAAAAAAATAACCTTCATACAGAAACTACTAAATAAAAATAGTTAAGACTACAAGTTCAGGAGTTCCCGTCGTGGCGCAGTGGTTAACGAATCCGACTAGAACATGAGGTTGCGGTTCGTCCTGCCTTGCTCAGTGGTTAAGACCGGCGTTGCGTGAGCTGTGGTGTAGGTGCAGACGCGGCTCGGATTCCGCGTTGCTGTGGCTCTGGCGAGGCGTGATACAGCTCGATTCAACCCTAGCTGGAACTCCATATGCGCGGAGCGCCAAGAAATAGCAACAATAACAACAACAGAAAGACAAAAAAAAAAAAAAA:CAGAAACTACTAAATAAAAATAGTT:NC_010443.5_5537541-NC_010443.5_5538075,NC_010443.5_5537814-NC_010443.5_5538084 1/1:NA:534:6,5:+-:360,180:DEL,DEL:DEL00000089:NA:NA:NC_010443.5_5537542-NC_010443.5_5538076,NC_010443.5_5537814-NC_010443.5_5538085 I tried to lower the filtering criteria and instead of using the --filter parameter for filtering, I used the awk command for filtering with su>2. After that, I ran samplot vcf again without specifying the filtering parameter, but it still couldn't generate images or an HTML file with data. (py37) [chenzhaojin@lfpara survivor]$ awk -F'\t' '($0 ~ /^#/ || ($8 ~ /SVTYPE=DEL/ && $8 ~ /SUPP=[2-9][0-9]*|SUPP=[1-9][0-9]+/)) {print}' filtered_1-30_fixed.vcf > supp2_1-30_DEL.vcf samplot vcf \ --vcf supp2_1-30_DEL.vcf \ -d test_2/ \ -O png \ --important_regions bed2.bed\ -b /home/chenzhaojin/expansion/sv/mid75/1-30/delly/alignment_1_sorted.bam \ --sample_ids sample1-30 > samplot_command.sh (py37) [chenzhaojin@lfpara survivor]$ cat bed2.bed NC_010443.5 532 11200 When I use the samplot plot command with my BAM file, it successfully generates images as expected. time samplot plot \ -n 1-1 1-2 1-30 \ -b /home/chenzhaojin/expansion/sv/mid75/1-1/delly/alignment_1_sorted.bam \ /home/chenzhaojin/expansion/sv/mid75/1-2/delly/alignment_1_sorted.bam \ /home/chenzhaojin/expansion/sv/mid75/1-30/delly/alignment_1_sorted.bam \ -o 1_105_274330532.png \ -c NC_010443.5 \ -s 105 \ -e 274330532 \ -t DEL`

There were no error messages, so I'm at a loss on how to proceed. I would appreciate any assistance or guidance you can provide.

pontushojer commented 1 year ago

Which version of samplot are you using? There have been quite a few additions since the latest release. I would recommend installing the development version for now, using for example pip install git+https://github.com/ryanlayer/samplot.git.

It would be good if you could rerun your samplot vcf command with the added argument --debug and post the output here. This will hopefully provide some more information on why no SV is being plotted.

karen916 commented 1 year ago

Thank you for your assistance. I recently updated from version 1.3.0 to 1.3.1. After the update, I encountered the following error:(py37) [chenzhaojin@lfpara survivor]$ samplot vcf \

--filter "SVTYPE == 'DEL' & SU >= 5" \
--filter "SVTYPE == 'INV' & SU >= 3" \
--vcf filtered_1-30_fixed.vcf\
-d test_3/ \
-O png\
--important_regions bed2.bed\

--sample_ids 1-30 \ --debug \ -b /home/chenzhaojin/expansion/sv/mid75/1-30/delly/alignment_1_sorted.bam > samplot_commands.sh samplot_vcf - ERROR: No RG field in alignment file /home/chenzhaojin/expansion/sv/mid75/1-30/delly/alignment_1_sorted.bam samplot_vcf - ERROR: Include ordered list of sample IDs to avoid this error 'RG' I've checked the header information of my BAM file and noticed that it indeed lacks the @RG field. Here's a snippet from the header of my file: (py37) [chenzhaojin@lfpara survivor]$ samtools view -H /home/chenzhaojin/expansion/sv/mid75/1-30/delly/alignment_1_sorted.bam @HD VN:1.3 SO:coordinate @SQ SN:NC_010443.5 LN:274330532 @SQ SN:NC_010444.4 LN:151935994 @SQ SN:NC_010445.4 LN:132848913 @SQ SN:NC_010446.5 LN:130910915 @SQ SN:NC_010447.5 LN:104526007 @SQ SN:NC_010448.4 LN:170843587 @SQ SN:NC_010449.5 LN:121844099 @SQ SN:NC_010450.4 LN:138966237 @SQ SN:NC_010451.4 LN:139512083 @SQ SN:NC_010452.4 LN:69359453 @SQ SN:NC_010453.5 LN:79169978 I look forward to your response. Thank you very much for your assistance.

pontushojer commented 1 year ago

You need to provide a sample ID using the --sample_ids argument. This sample id should be the same as the sample name in your VCF.