Snakefile_TGS运行报错

JingChunSun commented 4 months ago

郑博士，您好：感谢您开发的PanPop软件，对于我们做SV分型合并的有很好的帮助。但是我在使用自己的数据时候出现了报错，log报错如下： [M::mm_idx_gen::141.2161.24] collected minimizers [M::mm_idx_gen::182.3901.42] sorted minimizers [M::main::182.3901.42] loaded/built the index for 613 target sequence(s) [M::mm_mapopt_update::192.8371.39] mid_occ = 117 [M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 613 [M::mm_idx_stat::194.500*1.39] distinct minimizers: 190247661 (93.54% are singletons); average occurrences: 1.305; average spacing: 10.075; total length: 2501912388 [E::parse_cigar] CIGAR length too long at position 1735 (269777929H) [W::sam_read1_sam] Parse error at line 657 samtools view: error reading file "-" 我曾用自己数据单独跑过minimap2，是能正常运行的，不知道这里为什么会报错。

starskyzheng commented 4 months ago

那换成您自己的minimap2试试？在config/software.yaml里定义

另外，PanPop在此处的命令行是： https://github.com/starskyzheng/panpop/blob/8b37efe67f91cb45062c83b5b2834636f69d14e0/subworkflows/callSV3.py#L70 您可以试试这个看是否报错

JingChunSun commented 4 months ago

应该不是软件安装的问题，我用conda里面的minimap2单独拆开运行了命令：“minimap2 -a -x asm5 --cs -r2k -t 20 Ref.fa /home/sunjingchun/NX.project/2.SV_dection/nx.fa 2>>logs/2.nx.SVIM_asm.1.minimap2.log > nx.minimap2.sa” 是可以正确生成sam结果，同时也运行了samtools也可以正常生成结果。但是运用管道的时候，并没有在“03_vcf/06_SVIM_asm/nx/”中发现1.minimap.bam这个文件，所以导致这个报错“samtools view: error reading file "-"；后续流程是读取这个1.minimap.bam，然后用SVIM-asm分型SV。报错的地方应该是在这一行：https://github.com/starskyzheng/panpop/blob/8b37efe67f91cb45062c83b5b2834636f69d14e0/subworkflows/callSV3.py#L320

starskyzheng commented 4 months ago

似乎是染色体太长导致的 https://github.com/samtools/samtools/issues/1611 您可以考虑下将染色体分割；或者改一下脚本不调用samtools，直接生成sam用于后续分析。另外，确定单独运行情况下，samtools不报错？

JingChunSun commented 4 months ago

我测试了一下单独运行samtools是不报错的，我删除了callSV3.py脚本line320管道中samtools“-”这个参数，就可以正常运行了，目前还在运行，暂时没有报错，目前已经生成3.haploid文件夹了，具体修改位置如下 https://github.com/starskyzheng/panpop/blob/8b37efe67f91cb45062c83b5b2834636f69d14e0/subworkflows/callSV3.py#L320

starskyzheng commented 4 months ago

好的，有结果欢迎随时反馈

JingChunSun commented 4 months ago

好的，有结果欢迎随时反馈

郑博士您好，跟你反馈下，删除“-”参数后，已经能完整运行一个个体三代的panpop流程了。目前得到的结果文件中，哪一个才是我最终要的SV的集合？是05_merge_samples文件夹下面的15.thin3.sv.vcf.gz文件吗？

starskyzheng commented 4 months ago

可以的，这个里面只有SV。如果您需要分析的更细致的话，可以用15.thin3.vcf.gz，这里还包含了SNP和INDEL

JingChunSun commented 4 months ago

可以的，这个里面只有SV。如果您需要分析的更细致的话，可以用15.thin3.vcf.gz，这里还包含了SNP和INDEL

郑博士好，还有个问题，就是在执行群体NGS数据call SV的时候，合并的时候支持添加个体吗？比如我先跑了100个NGS个体的合并后的SV，后来我准备再加上50个个体的NGS，可以直接合并还是说得重新把这个150个NGS个体重新做个sample list，再一起跑？

starskyzheng commented 4 months ago

合并过程不支持，但是可以用之前vg的SV结果（2.callSV) 建议将后面步骤的文件夹删掉或者mv到别的地方，再重跑

JingChunSun commented 3 months ago

合并过程不支持，但是可以用之前vg的SV结果（2.callSV) 建议将后面步骤的文件夹删掉或者mv到别的地方，再重跑

郑博士好，感谢你的回复。后续我有测试了NGS的流程，利用之前panpop跑出的三代的SV构建了gfa文件，然后执行了NGS的流程，可是出现下面的报错：IndexError in file /home/sunjingchun/software/panpop/panpop/subworkflows/callSV.py, line 19: list index out of range File "/home/sunjingchun/software/panpop/panpop/Snakefile_NGS", line 24, in File "/home/sunjingchun/software/panpop/panpop/subworkflows/callSV.py", line 19, in list_prepaire_files 这是因为构建的gfa的文件问题吗？

starskyzheng commented 3 months ago

应该是list文件格式不对

JingChunSun commented 3 months ago

应该是list文件格式不对

郑博士好，我参考sample中的list重新调整了，运行NGS后出现了新的报错：Building DAG of jobs... MissingInputException in rule split_vcf_by_type2 in file /home/sunjingchun/software/panpop/panpop/subworkflows/panpopbase.py, line 226: Missing input files for rule split_vcf_by_type2: output: 5.final_result/2.final.all.snp.vcf.gz, 5.final_result/2.final.all.indel.vcf.gz, 5.final_result/2.final.all.sv.vcf.gz wildcards: filename=2.final.all affected files: 5.final_result/2.final.all.all.vcf.gz 这是什么原因？

starskyzheng commented 3 months ago

应该是split_chr设置为False了，这部分还不完善，您先用True吧，我后面会修复这个bug

starskyzheng commented 3 months ago

修复了，请试一下这个branch： p0513 有结果欢迎反馈！

JingChunSun commented 3 months ago

应该是split_chr设置为False了，这部分还不够完善，你先用True吧，我后面会修复这个bug

郑博士好，感谢你积极的回复！现在更改成True，可以正常运行，暂时没有报错。修复后的False还没有测试，等True跑完测试下，后续会继续反馈。

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

starskyzheng / panpop

Snakefile_TGS运行报错 #44