llecompte / SVJedi

SV genotyping with long reads
GNU Affero General Public License v3.0
39 stars 4 forks source link

ValueError: not enough values to unpack #14

Open lxxiaoxiaLi opened 2 years ago

lxxiaoxiaLi commented 2 years ago

Hi, Lolita, After running: python3 SVJedi//svjedi.py -v $dir/Chr12.INS.CW01.seq.vcf -r nip -i reads.pass.fastq.gz -o $dir/Chr12.INS.CW01.gy -t 4 -d ont

I got this error: Traceback (most recent call last): File "/public/home/Shang-team/project/lxx/software/SVJedi//svjedi.py", line 193, in main(sys.argv[1:]) File "/public/home/Shang-team/project/lxx/software/SVJedi//svjedi.py", line 186, in main genotype.genotype(paf_file, vcf_file, output_file, min_support, d_over, d_end, ladj) File "/public/home/Shang-team/project/lxx/software/SVJedi/modules/genotype.py", line 84, in genotype readId, readLength, readStart, readEnd, , refId, refLength, refStart, refEnd, match, blockLength, quality, *_ = line.split("\t") ValueError: not enough values to unpack (expected at least 12, got 3)

here is a line from one of the PAF files: f3e2543c-0f1b-45c5-be1e-42f29d21a75a 31035 8392 8644 + ref_nip.Chr12_19715631-239 10000 45 302 235 263 0 NM:i:28 ms:i:348 AS:i:348 nn:i:0 tp:A:S cm:i:11 s1:i:119 de:f:0.0856 rl:i:15 cg:Z:14M1D37M2I10M1I47M1I22M1D29M2D35M2D10M1D20M3D6M2I11M1D5M (base)

And it is strange that I only got this error on some samples from the same ONT sequencing batch. Please help me,Thanks Xiaoxia Li

llecompte commented 2 years ago

Hi Xiaoxia,

Thank you for using SVJedi. I'll try to fix this issue as soon as possible.

Could you please share with me the result of this command?

awk '( NF < 12 ){print $0}' yourfile.paf

Best, Lolita

lxxiaoxiaLi commented 2 years ago

Hi, Lolita

I'm really sorry for my very late answer. I've taken care of it. I have another question: For rice genomes, and I want to ask if it's ok if I use the default parameters below:

-dover Breakpoint distance overlap required (default 100 bp) -dend Soft-clipping length allowed to consider a semi-global alignment (default 100 bp) -ladj Length of sequences adjacent to each end of breakpoints (default 5,000 bp) -d/--data Type of sequencing data, either ont or pb (default pb)

In addition,I split the input file(v/--vcf Set of SVs in VCF) into small files,A chromosome is a file (where insertion and deletion are separated) ,such as Chr1.deletion.vcf, Chr1.insetion.vcf, Chr2.deletion.vcf, Chr2.insetion.vcf........ Does this process(python3 svjedi.py -v Chr1.deletion.vcf -a -i ) affect the accuracy of the results?

Please help me,Thanks Xiaoxia Li

llecompte commented 2 years ago

Hi Xiaoxia,

Can you tell me what you did to fix the problem with the PAF files, please? I was not aware that PAF files could have a variable number of fields.

Yes, I recommend using the default settings, especially for dover, dend, and ladj. But you should specify the type of sequencing data: ont or pb (--data). Let me know if you have HiFi data.

Finally, splitting the VCF files will have no impact if you use the same -a reference allele file each time.

Don't hesitate if you have any other requests.

Best, Lolita

clemaitre commented 2 years ago

Hi,

Regarding the idea of splitting the input VCF file, here are some additional considerations and recommendations that may be useful to others :

In summary, in most cases, splitting the input VCF file is not a good idea.

If you get too many "not genotyped" SVs due to close or overlapping SVs in the VCF, instead of splitting the VCF, consider using SVJedi-graph (an improvement of SVJedi based on a graph representation of variants) : https://github.com/SandraLouise/SVJedi-graph

Best, Claire