sara-javadzadeh / FastViFi

Detect viral infection and integration sites on NGS input. Manuscript is in preparation.
GNU General Public License v3.0
10 stars 2 forks source link

kraken datasets unavailable #12

Closed ugh-astya closed 1 year ago

ugh-astya commented 1 year ago

Hi Sara When I try to download the kraken datasets from the good drive links, I keep encountering a network error issues. Around 10-12gbs download and then download fails. I've tried different browsers, wget (CLI) and gdown (CLI), none work. It might be an issue with the gdrive link. Is it possible to run FastVIFI without this datasets or if you have any mirror link or some other place where I can download the dataset from that'd be great. Thanks

sara-javadzadeh commented 1 year ago

Hi Agastya,

Thanks for the feedback. Let me look into alternative ways to share the data and get back to you.

sara-javadzadeh commented 1 year ago

While I look into alternative ways to share kraken databases, feel free to build kraken databases for your viral sequences based on the instructions linked here i.e., section Custom Databases for FastViFi. May I ask which virus you are working with? The pre-built kraken database is for Human papillomavirus (HPV). If you are working with any other virus, you need to build the custom kraken database. Please let me know in case of questions.

sara-javadzadeh commented 1 year ago

Hi again,

Please try downloading the two database for HPV virus individually: HPV-kraken-database HPV-HG-kraken-database

Best, Sara

ugh-astya commented 1 year ago

Hi and thanks a lot for the info. I am working with HPV viral integration sites. The links youve shared seems to be working. From the 2 datatsets you've linked to, what's the difference between them and which one should I specify in the FastVIFI command --kraken-db-path ? Thanks

sara-javadzadeh commented 1 year ago

Hi Agastya,

Good question. You need both databases. FastViFi performs a 2-level filter based on Kraken and a different database is needed for each level. Two-level filtering provided higher accuracy and sensitivity compared to a single level kraken filter. If you download both within the same directory, you can pass that directory (parent directory for both databases) to the --kraken-db-path parameter.

Let me know in case of other questions. Sara

ugh-astya commented 1 year ago

Thanks for the explanation. My issue with the database has been resolved. FYI, the docker image does not have the _run_kraken_vificontainer.py file in the fastvifi folder. I used wget to add it to the directory. Upon running, it calls on the parse_input_args function from the _run_kraken_vifipipeline.py and returns an error saying the function (parse_input_args) doesn't take an argument. I checked the python code and the code (_run_kraken_vifipipeline.py) for the function on the github repo is different from the one inside the docker image. The function on github does take arguments, whereas the one in the docker image does not. Maybe the docker image is not up to date? I am using the latest docker image.

sara-javadzadeh commented 1 year ago

Hi Agastya,

Thanks for bringing this to my attention. There's some issues with the updated script working with the old Docker image (translated to Singularity .sif file). I'm working on updating the Docker image and testing it. I'll update you when I upload the new Docker image.

Best, Sara

On Wed, Aug 16, 2023 at 11:41 AM Agastya Singh @.***> wrote:

Thanks for the explanation. My issue with the database has been resolved. FYI, the docker image does not have the run_kraken_vifi_container.py file in the fastvifi folder. I used wget to add it to the directory. Upon running, it calls on the parse_input_args function from the run_kraken_vifi_pipeline.py and returns an error saying the function ( parse_input_args) doesn't take an argument. I checked the python code and the code (run_kraken_vifi_pipeline.py) for the function on the github repo is different from the one inside the docker image. The function on github does take arguments, whereas the one in the docker image does not. Maybe the docker image is not up to date? I am using the latest docker image.

— Reply to this email directly, view it on GitHub https://github.com/sara-javadzadeh/FastViFi/issues/12#issuecomment-1681103890, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOGKYDXJH2445VDT66RPU53XVUH7HANCNFSM6AAAAAA3MM4VYQ . You are receiving this because you commented.Message ID: @.***>

ugh-astya commented 1 year ago

Sounds good. Best, Agastya

sara-javadzadeh commented 1 year ago

Hi Agastya,

The docker image is updated. The image tag to use is sarajava:fastvifi:v1.1 However, you don't need to download it manually. [run_kraken_vifi_container.py](https://github.com/sara-javadzadeh/FastViFi/blob/main/run_kraken_vifi_container.py) will pull the docker image and create a Singularity .sif file. Please review pull the latest updates from the GitHub repo and review the latest changes in the manual.

ugh-astya commented 1 year ago

Thanks a lot Sara! I'll get back to you soon after checking this. I had a question: will FastVIFI work with long read sequencing (nanopore) data? From what I understand, the algorithm bins the read-pairs into different categories based on where and how the read-pairs map to the human and viral reference. Is it possible to give FastVIFI just a single .fastq file for breakpoint detection?

P.S. Do I just wget the run_kraken_vifi_container.py file and run it? Or should I git clone the entire repo and then run the python file? I'm quite new to docker.

sara-javadzadeh commented 1 year ago

Hi Agastya,

Do you have any paired end short reads from your samples? Do I understand correctly that you only have ONT reads and want to find viral-human junctions in the sequenced reads?

I haven't tested FastViFi on ONT reads as FasVifi is optimized for short paired end reads in form of two fastq files (one for each mate in the paired end reads). In FastViFi we leverage the paired end reads where one mate is mapped to the human genome and one mate is mapped to viral genome, to detect the viral human hybrid junction. We report fully viral reads in FastViFi as well. The idea for HMMs (in ViFi) is to map a highly divergent virus to HMMs, when mapping with BWA might fail, due to high variability in the viral sequence. However, we do rely on the fact that the viral short reads being aligned to HMMs are mostly viral. So passing a ONT read that might have a small viral sequence in the middle is not what FastViFi is designed for.

A simple idea to make this work is to sample paired end short reads from your ONT reads and feed that to FastViFi.

Regarding your last question, I'd suggest cloning the entire repo. Although you might not use other scripts in the repo, it's easier to track changes. You do not need to download the docker image. The run_kraken_vifi_container.py script pulls the docker image automatically.

ugh-astya commented 1 year ago

Thank you for the answer. My query has been resolved.