mcmero / SVclone

A computational method for inferring the cancer cell fraction of tumour structural variation from whole-genome sequencing data.
BSD 3-Clause "New" or "Revised" License
40 stars 10 forks source link

svclone count --> ValueError: multiple tag sizes BAM file #32

Closed handoko12u closed 8 months ago

handoko12u commented 8 months ago

Dear @mcmero I have a WGS human cancer data. I did SV call with Sniffles2, then annotate it with SnpEff. I want to know the CCF, I want to do it with SVClone. I suppose, I do not need to do the annotate step anymore, since my VCF file has been annotated with SnpEff, is that correct?

I continue to do svclone count, but there is an error, as shown below: image

Please help how to solve? Thank you

mcmero commented 8 months ago

The count step does not support VCF input, so you will have to run the annotate step first. However, the error message indicates that you have different read lengths in your BAM, which SVclone does not support. If you do have a good idea of your average read size however, you can specify this in your config file to avoid this error.

handoko12u commented 8 months ago

Hello @mcmero

I have set the average read and standard deviation in config file as follow: SV processing-related options [BamParameters] read length of BAM file; -1 = infer dynamically. read_len: -1 Mean fragment length (also known as insert length); -1 = infer dynamically. insert_mean: 2500 Standard deviation of insert length; -1 = infer dynamically. insert_std: 10000 mean coverage of the bam used as parameter in cluster number initialisation informs max read depth we consider when extracting reads from SV loci mean_cov: 50 maximum considered copy-number informs max read depth we consider when extracting reads from SV loci max_cn: 10

Then I try to run svclone annote, this is the result:

(svclone) sysadmin@sysadmin:/var/lib/minknow/data/20230907_LSK114_KNF_DNA_Long/4562949_ReRun1$ svclone annotate -i /home/sysadmin/Downloads/R_practice/sv_vcf/s1_phased.vcf -b /var/lib/minknow/data/20230907_LSK114_KNF_DNA_Long/4562949_ReRun1/bam_final/merged_sort.bam -s s1_annotate_svclone -cfg /home/sysadmin/Downloads/svclone_config.ini Loading SV calls... Traceback (most recent call last): File "/home/sysadmin/mambaforge/envs/svclone/bin/svclone", line 10, in sys.exit(main()) File "/home/sysadmin/mambaforge/envs/svclone/lib/python3.10/site-packages/SVclone/cli.py", line 187, in main args.func(args) File "/home/sysadmin/mambaforge/envs/svclone/lib/python3.10/site-packages/SVclone/SVprocess/annotate.py", line 541, in preproc_svs svs = svp_load_data.load_input_vcf(svin, class_field, use_dir) File "/home/sysadmin/mambaforge/envs/svclone/lib/python3.10/site-packages/SVclone/SVprocess/svp_load_data.py", line 20, in load_input_vcf for sv in sv_vcf: File "/home/sysadmin/mambaforge/envs/svclone/lib/python3.10/site-packages/vcf/parser.py", line 586, in next samples = self._parse_samples(row[9:], fmt, record) File "/home/sysadmin/mambaforge/envs/svclone/lib/python3.10/site-packages/vcf/parser.py", line 485, in _parse_samples sampdat[i] = float(vals)

Please advice, what's wrong. Thanks

mcmero commented 8 months ago

SVclone's input format follows the VCF specification for SVs, so it is possible your SV VCF does not follow this format. It would help if you posted a sample of your VCF input file. You could also try converting your SVs into the tab-delimited format described in the README.

handoko12u commented 8 months ago

here is my sample VCF file: image

mcmero commented 8 months ago

These look like SNV calls (not SVs). This VCF will not work as SV input, but you can still run SNVs through the pipeline from the filter step onwards, or run standalone ccube.

handoko12u commented 8 months ago

Ok will try ccube.

How about this VCF: image is it SV VCF file? Thanks

mcmero commented 8 months ago

Yes, that looks like an SV VCF.