yyoshiaki / VIRTUS2

A bioinformatics pipeline for viral transcriptome detection and quantification considering splicing.
Other
16 stars 6 forks source link

Changing the viral database #33

Closed xapple closed 8 months ago

xapple commented 9 months ago

I installed VIRTUS2 and tested the workflow. I was able to reproduce the results for the test sample ERR3240275 and got the same output as the example table that is shown in the README.

However, I noticed that the viruses included in the database are quite few. Only 762 entries as seen here:

https://github.com/yyoshiaki/VIRTUS2/blob/master/data/200830_viruses.txt

Would it be possible to substitute this database with a larger one, for instance all of refseq viral ?

Would this be recommended, or are there considerations against this?

I just replaced line #9 of createindex.job.yaml with this new URL and ran createindex.cwl again.

https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.1.genomic.fna.gz

It seems to be working?

xapple commented 9 months ago

First, I tried editing createindex.job.yaml to put the URL of the refseq viral fasta instead of the original URL, but this did not work, because though the pipeline supports having the human genome FASTA file compressed, it doesn't seem to support having the virus file compressed with gzip.

Secondly, I tried editing createindex.job.yaml and substituted the viral URL with the path to a local FASTA file (after downloading and decompressing manually) but this did not work either.

How can one change the viral database used in VIRTUS2?

Thanks.

yyoshiaki commented 8 months ago

Thank you, Please refer to #19.