rki-mf1 / covsonar

A database-driven system for handling genomic sequences of SARS-CoV-2 and screening genomic profiles.
GNU General Public License v3.0
6 stars 0 forks source link

VCF file restored from covsonar and Usher pre-processing #84

Closed Achilleas-Galanopoulos closed 1 year ago

Achilleas-Galanopoulos commented 1 year ago

During the pre-processing step in order to create a mutation annotated tree (mat) for Usher there is need of a tree (newick format) and a vcf file for the same samples. I tried to make a mat using covsonar. More specifically, I had a newick format tree and the related sequences in a fasta file. I used covsonar in order to make a database associated with such fasta file. Subsequently, I restored the database in a vcf file with mutations for all samples. I also used PareTree tool in order to remove branch lengths from the newick tree. Then I ran Usher pre-processing step using the restored vcf file and the newick tree. The resulting mat tree was stored in a protobuf file. Then, I used matUtils extract in order to extract a json file and have a visualization of the tree using Auspice. The visualization looked wrong and the branch lengths were not calculated correctly. I checked for differences between the vcf file restored from covsonar and the vcf file provided from Usher documentation for testing. The basic difference is associated with dots in the vcf restored from covsonar. More specifically, if there is a mutation for a specific sample in a specific position there is 1, while there is "." when there is not a mutation. I replaced all "." with 0 and I tried to run Usher pre-processing step again. The problem was fixed and the json file that I extracted from the protobuf file was visualized nicely. The branch lengths seemed to be calculated correctly in this way.

stephan-fuchs commented 1 year ago

Thank you for bringing up your concern. To address your problem, I suggest trying out Covsonar 2-alpha (https://github.com/rki-mf1/covsonar/tree/dev/covsonar2-alpha), which is designed to export VCF v4.2 files. This should provide a suitable solution to your issue.

However, please let me know if you have any further questions or if you feel that this suggestion does not meet your requirements. I am always here to assist you in finding the best possible solution.