rki-mf1 / covsonar

A database-driven system for handling genomic sequences of SARS-CoV-2 and screening genomic profiles.
GNU General Public License v3.0
6 stars 0 forks source link

Update to [covSonar V.1.1.2] - [merged] #44

Closed silenus092 closed 2 years ago

silenus092 commented 2 years ago

In GitLab by @kunaphas.kon on Dec 11, 2021, 22:03

Merges dev/tovcf -> master

In covSonar V.1.1.2

Improvement

1.Readme.md

I update the readme.md file under the 3.3 section and add the new one (3.7 section).

2.Parent-Child relationship

Now we can query all sublineages by using --with-sublineage tag along with match command.

# We want to get all sublineages of delta variant (B.1.617.2).
# This query will return result as ['B.1.617.2', 'AY.1', 'AY.2', ..., , 'AY.129', 'AY.130']
path/to/covsonar/sonar.py match -i S:N501Y --lineage  B.1.617.2 --with-sublineage --db mydb > out.csv

I use sublineage information from https://github.com/cov-lineages/pango-designation/ and create a custom script to automatically convert all information to precomputed lineage file (under lib/lineage.all.tsv). This file might need to be updated regularly, and I'm not sure that should we keep this file under the lib folder, or we should place it somewhere in sc2 space for easy updating.

New Features

1.Export DB to VCF file

covSonar can export accession records in a VCF format using the var2vcf command. The output from this feature is a single VCF file that combines all accessions. The output format is in .gz form.

# Export all accessions in the database.
path/to/covsonar/sonar.py var2vcf --db mydb -o merge.vcf
# Just like the option in the match command, we can use  --file, --acc and --date to enable specific accession export.
path/to/covsonar/sonar.py var2vcf --db mydb -f acc.10.txt -o merge.vcf
# To speed up the query, we can use --cpus tag to aid us.
path/to/covsonar/sonar.py var2vcf --db mydb --date 2021-08-01:2021-08-10 -o merge.vcf --cpus 20

# Another solution, we can use --betaV2 tag (x3-5 times faster), 
# The current version is under development, so if you found any bug please report it to us.
path/to/covsonar/sonar.py var2vcf --db mydb --date 2021-08-01:2021-08-10 -o merge.vcf --cpus 20 --betaV2

there is still more room for improvement to this feature, I will look at it again when it is an urgent

Fix Bugs

None


silenus092 commented 2 years ago

In GitLab by @kunaphas.kon on Dec 21, 2021, 12:29

added 1 commit

Compare with previous version

silenus092 commented 2 years ago

In GitLab by @kunaphas.kon on Dec 22, 2021, 15:58

marked this merge request as ready

silenus092 commented 2 years ago

In GitLab by @kunaphas.kon on Dec 22, 2021, 15:58

requested review from @s.fuchs

silenus092 commented 2 years ago

In GitLab by @kunaphas.kon on Dec 25, 2021, 19:42

added 1 commit

Compare with previous version

silenus092 commented 2 years ago

In GitLab by @kunaphas.kon on Dec 27, 2021, 23:05

added 1 commit

Compare with previous version

silenus092 commented 2 years ago

In GitLab by @kunaphas.kon on Jan 4, 2022, 16:05

added 1 commit

Compare with previous version

silenus092 commented 2 years ago

In GitLab by @kunaphas.kon on Jan 7, 2022, 20:20

added 1 commit

Compare with previous version

silenus092 commented 2 years ago

In GitLab by @s.fuchs on Jan 13, 2022, 08:41

mentioned in commit 91a7fc2cc2c39ff2bf696e8c705a1b604a5c7b4b

silenus092 commented 2 years ago

In GitLab by @s.fuchs on Jan 13, 2022, 08:43

Hi Note, great work! Is there a option to update (download and process) the lineage information easily (e.g. by an option --update-sublineages)

silenus092 commented 2 years ago

In GitLab by @kunaphas.kon on Jan 13, 2022, 10:11

Ok @s.fuchs , I can implement that , and put it in the next update. I plan to put --update-sublineages under **update** command and also under lib folder, it might be reasonable.