tjiangHIT / cuteSV

Long read based human genomic structural variation detection with cuteSV
MIT License
253 stars 36 forks source link

force recall option #14

Closed proinde closed 3 years ago

proinde commented 4 years ago

One of the things that is likely to be a heavily used workflow in the future is to look at inheritance patterns of SVs in clinical samples for pathogenic SVs. Doing so requires a merging of several individuals' variant sets and merging of associated genotypes and such. One way this can be done currently is to merge SVs between samples, then to force the SV caller to make an assessment for each SV in the resulting superset of SVs because this merging step is quite messy. Such a two-step approach is described here: SV calling for a population using sniffles + SURVIVOR to perform SV calling and merging respectively.

Would it be possible to add a --forceall option so that this kind of two-step workflow could be performed using CuteSV rather than sniffles?

tjiangHIT commented 4 years ago

Hello @proinde This is a good suggestion which is also proposed by @AJ211 in #6. I have some tricky things to deal with and I'll update this feature after that, so stay tuned. Thank you for your continued advice.

Best, Tao

proinde commented 4 years ago

Ahh yes! I hadn't connected the title of his request to the operation for some reason. Looking forward to this!

tjiangHIT commented 4 years ago

Hello @proinde Finally, cuteSV has the ability to perform force calling. Thanks to @Meltpinkg for her great work. Let me give a short description for running cuteSV to complete population-based SV calling:

  1. Run cuteSV for each sample to generate sample specific SV callsets.
  2. Perform SURVIVOR to merge every single vcf into merged.vcf.
  3. Rerun cuteSV for each sample with -Ivcf merged.vcf (force calling step).
  4. Perform SURVIVOR again to merge every force called single vcf into _finalmerged.vcf. Wish for hearing your feedbacks!

Best, Tao

Chenglin20170390 commented 3 years ago

Many thanks for the population calling pipeline of cuteSV.

Following this pipeline, I get an error.

  1. for each sample, I running different sv calling software(cuteSV,pbsv,sniffles,svim). and then merge them for individual sample
  2. perform SURVIVOR to merge every single vcf into merged.vcf based on sniffles population pipeline
  3. Rerun cuteSV for each sample with -Ivcf merged.vcf (force calling step). However, I get error in this step my command show belows ls 04_one_gt/*.vcf > vcf_files_raw_calls.txt
    SURVIVOR merge vcf_files_raw_calls.txt 1000 1 1 -1 -1 -1 merged_SURVIVOR_1kbpdist_typesave.vcf
    cuteSV 01_bam/$i.ngmlr.sorted.bam $ref $i.cuteSV2.vcf ./ --threads 20 --max_cluster_bias_INS 1000 --diff_ratio_merging_INS 0.9 --max_cluster_bias_DEL 1000 --diff_ratio_merging_DEL 0.5 -Ivcf merged_SURVIVOR_1kbpdist_typesave.vcf --genotype

the part of error file

021-04-15 22:22:32,424 [INFO] Finished chr12:0-10000000.
2021-04-15 22:22:36,289 [INFO] Finished chr12:40000000-50000000.
2021-04-15 22:22:40,943 [INFO] Finished chr12:50000000-59670755.
2021-04-15 22:22:50,387 [INFO] Finished chr12:30000000-40000000.
2021-04-15 22:22:57,728 [INFO] Finished chr09:30000000-40000000.
2021-04-15 22:22:57,885 [INFO] Rebuilding signatures of structural variants.
2021-04-15 22:23:19,186 [INFO] Check the parameter -Ivcf: OK.
2021-04-15 22:23:19,187 [INFO] Enable to perform force calling.
Traceback (most recent call last):
  File "/public/agis/huangsanwen_group/chenglin/softwares/miniconda3/bin/cuteSV", line 4, in <module>
    __import__('pkg_resources').run_script('cuteSV==1.0.10', 'cuteSV')
  File "/public/agis/huangsanwen_group/chenglin/softwares/miniconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 651, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/public/agis/huangsanwen_group/chenglin/softwares/miniconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1448, in run_script
    exec(code, namespace, namespace)
  File "/public/agis/huangsanwen_group/chenglin/softwares/miniconda3/lib/python3.7/site-packages/cuteSV-1.0.10-py3.7.egg/EGG-INFO/scripts/cuteSV", line 801, in <module>
    run(sys.argv[1:])
  File "/public/agis/huangsanwen_group/chenglin/softwares/miniconda3/lib/python3.7/site-packages/cuteSV-1.0.10-py3.7.egg/EGG-INFO/scripts/cuteSV", line 797, in run
    main_ctrl(args, argv)
  File "/public/agis/huangsanwen_group/chenglin/softwares/miniconda3/lib/python3.7/site-packages/cuteSV-1.0.10-py3.7.egg/EGG-INFO/scripts/cuteSV", line 672, in main_ctrl
    max_cluster_bias_dict, threshold_gloab_dict, args.gt_round, args.threads)
  File "/public/agis/huangsanwen_group/chenglin/softwares/miniconda3/lib/python3.7/site-packages/cuteSV-1.0.10-py3.7.egg/cuteSV/cuteSV_forcecalling.py", line 352, in force_calling
    read_id_list, max_cluster_bias = find_in_list(sv_type, search_id_list, max_cluster_bias_dict[sv_type], pos, sv_end)
KeyError: 'NA

` And the 4 steps for cuteSV population calling pipeline is not easy for the newest bioinformatics to study ,if you have time, can you give a simple example command for us to follow? Many thanks for you and your team to develop such useful tools.

tjiangHIT commented 3 years ago

Hello @Chenglin20170390,

Thanks for pointing this error. The current version of cuteSV can just detect DEL, INS, INV, DUP, and BND. When there is a call in the provided Ivcf file, which is with the type not in the detectable type list, cuteSV will crash due to the unrecognized SV type. Now I have updated a new function to skip the calls with undetectable SV type. Please git clone the latest version of cuteSV and rerun your commands to acquire the newest results. Hope this helps!

Best, Tao

Chenglin20170390 commented 3 years ago

Hi, tao Many thanks for your help, the problem solved perfectly. (^▽^)

best, Lin