Closed wangjie07070910 closed 1 year ago
Hi,
You can specify the ploidy in a file (first column sample ID, second column ploidy). Add the option --ploidyFile ploidy_file.txt
If you get errors, please post the error here so I can help diagnose it.
Simon
Hi Simon,
Many thanks for your help, but I still encounter the error: parseVCF.py: error: argument --ploidy: invalid int value: 'ploidy_female.txt'. I presume that the content of my ploidy.txt file was not in the right format. The contents of my ploidy.txt file are as follows: sample ID ploidy Sample_1 2 Sample_2 4 Sample_3 4
Thanks again, Jie
If all of your individuals are tetraploid, you can use
--ploidy 4
If some of your individuals are diploid and some are tetraploid, use:
--ploidyFile ploidy_file.txt
Thanks again.
I used --ploidyFile ploidy_file.txt, and The contents of my ploidy.txt file are as follows:
sample_ID ploidy Sample_1 2 Sample_2 4 Sample_3 4
Then I got error: ValueError: invalid literal for int() with base 10: 'ploidy'
Also, I tried the ploidy_file.txt file without the table header:
Sample_1 2 Sample_2 4 Sample_3 4
Then I got error: IndexError: list index out of range
Please check your ploidy file for empty lines. It sounds like the script is trying to read a line in the file that has no data in it.
Thanks again. When I try to set my ploidy.txt file in the following format(When I turned 'ploidy' in the second column of the first row into a number), it worked
sample_ID 2 Sample_1 2 Sample_2 4 Sample_3 4
and I don't know if it has an effect. Besides, I'm having a new problem.
Error:Sample Sample_2 at Scaffold_1:1 genotype ./././. does not match explected ploidy of 2 (appears when I set Sample_2 to be a 2x.) Error:Sample Sample_2 at Scaffold_2:25 genotype ./. does not match explected ploidy of 4 (appears when I set Sample_2 to be a 4x.)
I know it's supposed to be a problem with my sample (it's supposed to be tetraploid), but I'm putting it here and I would appreciate if you could give your opinion. How should I preprocess a sample like this.
Yes, this is a problem with your vcf, which includes incorrect formatting for some sites. You can add the option --ploidyMismatchToMissing
to set these sites to missing data.
In general, please remember that you can type parseVCF.py -h
to see all the available options.
Thank you very much for such fantastic scripts, please, I have tetraploids in my sample, is it not working? I tried to use the script for Processing VCF files, here is my command: python VCF_processing/parseVCF.py -i input.vcf.gz --skipIndels --minQual 30 --gtf flag=DP min=5 max=50 -o output.geno.gz Yes, there are some samples in my vcf file that are tetraploids