maiziezhoulab / VolcanoSV

VolcanoSV enables accurate and robust structural variant calling in diploid genomes from single-molecule long read sequencing
MIT License
4 stars 0 forks source link

volcanosv_asm #2

Closed pengyan19 closed 2 weeks ago

pengyan19 commented 2 months ago

你好,我使用这个软件的时候,报错是找不到[E::hts_open_format] Failed to open file "/public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_asm_output/chr1//phasing_result/AHTL-1_phased.bam" : No such file or directory error: Error estimating alignment parameters. caused by: Error reading indexed FASTA file. caused by: Unknown sequence name. 请问该怎么解决呢?

volcano1998 commented 2 months ago

Hi, thank you for trying out our tool!

It looks like the reference file and your BAM file might not match. Can you double-check what reference your BAM file was generated on?

pengyan19 commented 2 months ago

Thank you very much for your response, but I've encountered a new issue now. This is the error report information from the volcanosv.

Separate fragments Map read_hp_og: 100%|██████████| 46475/46475 [00:00<00:00, 622965.54it/s] Roughly assign unphased reads: 100%|██████████| 250657/250657 [01:44<00:00, 2396.87it/s] 2024-08-24 15:49:08 INFO construct kmer database... Traceback (most recent call last): File "/public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-asm//count_kmer_v1.py", line 83, in with open(name_path,'r') as f: FileNotFoundError: [Errno 2] No such file or directory: '/public/home/maokaikai/data/10.OS/08.VolcanoSV/02.OS/volcanosv_asm_output/chr1//kmer_assign//reads.name' 2024-08-24 15:49:08 INFO split hash by up block... 2024-08-24 15:49:08 INFO load dict 2024-08-24 15:49:09 INFO load name file Traceback (most recent call last): File "/public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-asm//split_hash_by_hp.py", line 27, in with open(name_file,'r') as f: FileNotFoundError: [Errno 2] No such file or directory: '/public/home/maokaikai/data/10.OS/08.VolcanoSV/02.OS/volcanosv_asm_output/chr1//kmer_assign//reads.name' 2024-08-24 15:49:09 INFO assign unphased reads... Traceback (most recent call last): File "/public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-asm//get_raw_kmer_overlap_count_unphased_est_pbs_v1.py", line 113, in hash_files = [ up_dir + '/'+fname for fname in os.listdir(up_dir) if '.hash' in fname] FileNotFoundError: [Errno 2] No such file or directory: '/public/home/maokaikai/data/10.OS/08.VolcanoSV/02.OS/volcanosv_asm_output/chr1//kmer_assign//up_est_hash/' 2024-08-24 15:49:27 INFO delete unphased hash files... rm: cannot remove ‘/public/home/maokaikai/data/10.OS/08.VolcanoSV/02.OS/volcanosv_asm_output/chr1//kmer_assign//up_est_hash/’: No such file or directory 2024-08-24 15:49:27 INFO delete hash files... rm: cannot remove ‘/public/home/maokaikai/data/10.OS/08.VolcanoSV/02.OS/volcanosv_asm_output/chr1//kmer_assign//reads.seq’: No such file or directory 2024-08-24 15:49:27 INFO delete kmer db...

volcano1998 commented 2 months ago

Can you provide the command you use?

pengyan19 commented 2 months ago

This's code that i run source /public/home/maokaikai/miniconda3/bin/activate volcanosv && python /public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-asm/volcanosv-asm.py -bam /public/home/maokaikai/data/10.OS/08.VolcanoSV/02.OS/bam/ACB-3.bam -o /public/home/maokaikai/data/10.OS/08.VolcanoSV/02.OS/volcanosv_asm_output -ref /public/home/maokaikai/data/10.OS/08.VolcanoSV/02.OS/OS.fa -t 10 -chr 1 -dtype Hifi -px ACB-3

volcano1998 commented 2 months ago

I've been updating this GitHub repo recently. I'm afraid that you have an unstable version. Can you try to clone this repo again and reinstall it?

pengyan19 commented 2 months ago

ok,i try again

pengyan19 commented 2 months ago

I install this software today. So i think it's not the reason.

volcano1998 commented 2 months ago

I see. Were you able to recreate the result using the provided test data? https://zenodo.org/records/10520476.

I highly recommend trying the test data first.

volcano1998 commented 2 months ago

I will try it out myself.

pengyan19 commented 2 months ago

Hello, I apologize for the oversight; the test data works fine. I would like to know if your BAM file was aligned using minimap2. What version of minimap2 was used for alignment? Is the BAM file obtained by aligning sequencing data to the reference genome?

volcano1998 commented 2 months ago

No apology. I'm glad the test data works for you. We were using minimap 2.22-r1101. It is aligned to the reference. I wonder what aligner are you using? Can you send me the command you used to generate the BAM file?

pengyan19 commented 2 months ago

This is my version of minimap2 (2.21), and the command I allowed is as follows: /public/home/pengyan/anaconda3/bin/minimap2 -ax map-hifi -t 8 /public/home/maokaikai/data/10.OS/08.VolcanoSV/02.OS/OS.fa /public/home/pengyan/bioproject2/09.ACB_genome/06.pan-ACB-ECB/05.pan/01.SV/02.fastq/ACB-3.filt.fastq.gz --MD |/public/software/env01/bin/samtools view -@ 8 -bS |/public/software/env01/bin/samtools sort -@ 8 -o /public/home/maokaikai/data/10.OS/08.VolcanoSV/02.OS/bam/ACB-3.bam && /public/software/env01/bin/samtools index /public/home/maokaikai/data/10.OS/08.VolcanoSV/02.OS/bam/ACB-3.bam

volcano1998 commented 2 months ago

Your command looks fine to me. Is it possible that you extract a chr22 BAM file and send it to me? I can try running it.

pengyan19 commented 2 months ago

Hello, I just ran the program and found that the type was mistakenly set to ONT. However, I've encountered a new issue where the chromosome setting can only be up to 22; it doesn't work when it goes to 23. volcanosv-asm.py: error: argument --chrnum/-chr: invalid choice: 24 (choose from 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)

pengyan19 commented 2 months ago

Can i just change this code? parser.add_argument('--chrnum','-chr', type = int, choices=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22] )

volcano1998 commented 2 months ago

Hi, thank you for your advice! Due to the complexity of sex chromosomes (It is hard to perform phasing when there is a pair of chrX and chrY), the current version of VolcanoSV only works on autosomal chromosomes (chr1-22).

We might add this feature in the future, but for now, the usage of this tool is limited to chr1-22.

volcano1998 commented 2 months ago

Hi Yan, let's discuss here so other people can reference your problem in the future.

Can you provide the command you use and the error message?

I'm trying your data in the meantime by the way.

volcano1998 commented 2 months ago

Hi, I found the problem. Your BAM file contains some alignment records with None as sequence, which caused the error. I changed some of my code and now it worked. I was able to generate 72M contigs file. You may try git pull and run it again.

pengyan19 commented 2 months ago

Many thanks, I found another problem. The python script of volcanosv-vc-complex-sv.py and volcanosv-vc-small-indel.py have missed the parameter: "--prefix PREFIX"

volcano1998 commented 2 months ago

Hi Yan, thank you so much for pointing it out! I'm working on the test of these 2 scripts today and I will let you know when I get it done.

volcano1998 commented 2 months ago

Test done. Try git pull and run it again.

pengyan19 commented 2 months ago

OK. thanks,i will try.

pengyan19 commented 2 months ago

Hi, when i used the "volcanosv-vc-small-indel.py" script, i obtained the bug as follows:

2024-08-30 03:11:06 INFO


pair SNP, indel

2024-08-30 03:11:06 INFO
/public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Small_INDEL//k8 /public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Small_INDEL//dipcall//dipcall-aux.js vcfpair -p /data/maiziezhou_lab/CanLuo/Software/dipcall/hs37d5.PAR.bed /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_small_indel/var_raw.pair.vcf.gz | /public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Small_INDEL//htsbox/htsbox bgzip > /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_small_indel/var_raw.dip.vcf.gz

/public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Small_INDEL//dipcall//dipcall-aux.js:60: Error: [File] Fail to open the file var file = new File(fn_par); ^ Error: [File] Fail to open the file at vcfpair (/public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Small_INDEL//dipcall//dipcall-aux.js:60:14) at main (/public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Small_INDEL//dipcall//dipcall-aux.js:327:29) at /public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Small_INDEL//dipcall//dipcall-aux.js:333:1 2024-08-30 03:11:06 INFO


seperate multi-alt-allele

2024-08-30 03:11:06 INFO


filter 2-49 bp indel within None region

Writing to /tmp/bcftools.rUhaBk [E::bcf_hdr_read] Input is not detected as bcf or vcf format Could not read VCF/BCF headers from /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_small_indel/indel_2_49_raw.vcf Cleaning Traceback (most recent call last): File "/public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Small_INDEL/volcanosv-vc-small-indel.py", line 233, in call_var( File "/public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Small_INDEL/volcanosv-vc-small-indel.py", line 149, in call_var filter_vcf_by_size_bed(f'{output_dir}/var_raw_rf.dip.vcf', output_dir, bedfile) File "/public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Small_INDEL/volcanosv-vc-small-indel.py", line 51, in filter_vcf_by_size_bed subprocess.run(f"bcftools sort {output_folder}/indel_2_49_raw.vcf -Ov -o {sorted_vcf_path}", shell=True, check=True) File "/public/home/pengyan/anaconda3/envs/volcanosv/lib/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'bcftools sort /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_small_indel/indel_2_49_raw.vcf -Ov -o /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_small_indel/indel_2_49_sorted.vcf' returned non-zero exit status 255.

volcano1998 commented 2 months ago

Can you send me the command you use?

pengyan19 commented 2 months ago

Ok,I used multiple scripts and found that only the last one did not produce any results. The script as follows: source /public/home/pengyan/anaconda3/bin/activate volcanosv && python /public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Large_INDEL/volcanosv-vc-large-indel.py -i /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_asm_output -o /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_large_indel_output -dtype ONT -bam /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/bam/MDJ-4.bam -ref /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/MDJZ.sort.fa -t 10 -px MDJ-4 && python /public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Complex_SV/volcanosv-vc-complex-sv.py -i /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_asm_output -vcf /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_large_indel_output/volcanosv_large_indel.vcf -o /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_complex_sv -d ONT -bam /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/bam/MDJ-4.bam -ref /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/MDJZ.sort.fa -t 10 -px MDJ-4 && python /public/home/maokaikai/soft/VolcanoSV/bin/VolcanoSV-vc/Small_INDEL/volcanosv-vc-small-indel.py -i /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_asm_output -o /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/volcanosv_small_indel -bam /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/bam/MDJ-4.bam -ref /public/home/maokaikai/data/10.OS/08.VolcanoSV/01.MDJ/MDJZ.sort.fa -t 10 -px MDJ-4

volcano1998 commented 2 months ago

I saw in the error there is an absolute path. I just changed it. Can you try git pull and see if the error still comes up?

pengyan19 commented 2 months ago

OK, i will try.

pengyan19 commented 2 months ago

Hi, i met a question when i git pull.

1
volcano1998 commented 2 months ago

Hi Yan, I apologize for my mistake. My "git push" wasn't successful earlier. I just pushed again and double-checked to ensure the code is updated. Can you try git pull again?

The absolute path should disappear now.

pengyan19 commented 2 months ago

I've run into the process, thank you very much, but I don't know that the SNP information is in that file。

volcano1998 commented 2 months ago

Hi Yan, you are very welcome! We adopt Longshot's SNP call result as the final result, which is volcanosv_output/<chromosome_name>/phasing_result/<prefix>_phased.vcf.

pengyan19 commented 2 months ago

Hi, ok, thanks!

pengyan19 commented 2 months ago

Hi, If we divide chromosomes for variants detection, can we use bcftools to merge the VCFs obtained later? For example, SV also uses this software to merge?

volcano1998 commented 2 months ago

It is possible. You may also use customized script to do it.

pengyan19 commented 2 months ago

Hi, i found that the script of "volcanosv_large_indel" used the example data to run. I check the code and VCF. I found that the vcf file have error information as follows: ![Uploading 1.png…]()

volcano1998 commented 2 months ago

Hi Yan, I can't see the picture you sent. It shows "uploading 1.png". Can you try sending it again?

pengyan19 commented 2 months ago

ok, i have upload this.

1
pengyan19 commented 2 months ago

This didn't my reference genome. I guess that this is the test data. So, this software didn't change the path in software.

volcano1998 commented 1 month ago

Sorry about the inconvenience. I will work on this issue in these couple of days. I will report back when I'm done.

Our paper is just out, so some part of the software is not so perfect. Thank you so much for your patience and feedback sincerely!

volcano1998 commented 1 month ago

Hi Yan, I found the problem is that I used a default VCF header instead of generating a new one based on the reference. I just fixed that issue. Could you try git pull and rerun the large INDEL call?

Thank you so much for catching it!!

volcano1998 commented 1 month ago

Hi, If we divide chromosomes for variants detection, can we use bcftools to merge the VCFs obtained later? For example, SV also uses this software to merge?

I also added a customized Merge_VCF.py script. You may check the session "Merge VCF" in the Readme to learn how to use it.

pengyan19 commented 1 month ago

Very well done, thanks

pengyan19 commented 1 month ago

Hello, I recently found a problem, that is, my autosomal number has 31, but your software can only be set to 22, I manually changed the command of the volcanosv-vc-large-indel script, but still can not output more than 22 chromosomes of the variation data, can you expand to 31 chromosomes?

pengyan19 commented 1 month ago

And I set the -chr parameter, and the header of the output SV result is still the header of the instance data, and it has not been corrected (volcanosv-vc-large-indel.py)

volcano1998 commented 1 month ago

Hi Yan, I've updated the code so you can do more than 22 chromosomes now.

For the VCF header, I double-checked and it was updated to extract every contig and its length from the given reference file. Therefore you might have seen more than one chromosome in your header. To make it more intuitive, I just changed it to only write down the chromosome you specified.

You may try git pull and rerun your experiments. Thanks for the suggestions!

pengyan19 commented 1 month ago

Hi,i meet a new question as follows. Some of samples could get VCF, some of them don't.

1
volcano1998 commented 1 month ago

It seems there is some signature of length 0 (which did not happen before). I just added handler to this exception. Please try git pull and run the failed chromosomes again.

Thank you for catching this!