llecompte / SVJedi

SV genotyping with long reads
GNU Affero General Public License v3.0
40 stars 4 forks source link

How does svjedi work with multiple read files? #10

Open hdore opened 3 years ago

hdore commented 3 years ago

Hello,

I'm using SVJedi to call genotypes based on structural variants determined de novo by Sniffles (I'm actually more interested in the allele frequency since my organism is haploid). I have many samples for which I'd like to call genotypes based on the same input vcf file, so I was interested in using SVJedi option's to put multiple read files. I used the option -i file1.fasta file2.fasta

By doing so, I was expecting to get 1 column per sample in the VCF file with the results organized according to the FORMAT column (GT:DP:AD:PL). Instead, I get only 1 SAMPLE column with results. Am I misunderstanding what multiple files should represent? Are the reads considered altogether, no matter the file they come from? If this is the case, it would be a nice feature to be able to run multiple samples at the same time, but I can always run SVJedi on each sample separately and parse the files afterwards.

Also, the last line of the header (column headers) does not seem to correspond to the content of the columns in the rest of the file: it seems to be a copy of the last line of the header in the input vcf file, to which are added FORMAT SAMPLE at the end. For example, in your file Data/HG002_son/expected_genotype_results.vcf the header contains 12 fields (see below) but there are only 10 columns in the rest of the file.

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG002 FORMAT SAMPLE

Thank you for developing SVJedi and thank you for your help,

Hugo

llecompte commented 3 years ago

Hello Hugo,

Thank you for using SVJedi and for your feedback.

You right, SVJedi estimates genotype for a single sample, the reads are considered altogether. But as you suggest, it would be an interesting feature to add to SVJedi. I'll look into it in the next few days and keep you posted.

Also, thank you for noticing the issue with the column header, I fixed it in commit 3f15333b5b53d907c2e61b99be1b1e87ab085392.

Regards, Lolita