jodyphelan / malaria-profiler

0 stars 2 forks source link

How to specify resistance db for vcf files #8

Closed dativapereus closed 7 months ago

dativapereus commented 7 months ago

Hi @juuzia @jodyphelan @klausyboi , I have been trying to get this amazing tool to work but I am facing multiple issues. The biggest is that I am using vcf files. In the tutorial its says only to use malaria-profiler command, but that is not quite right , it should be malaria-profiler profile.

Back to my issue : when I use a vcf file as suggested in your code malaria-profiler -v </path/to/vcf> -p <sample_name> -t [threads] --txt

I get the following: Code:malaria-profiler profile -v $Pf_WGS_TZ_VCF/wgstz_merged_all_chroms.vcf -d $Pf_WGS_TZ_VCF/../malaria-profiler-results -p wgs_tz -t 90 --ram 500 --txt --csv --pdf

Error: malaria-profiler:85 Speciation can't be perfomrmed on a VCF file so a resistance database is needed.Specify with --resistance_db or --external_resistance_db

Upon adding the --resistance_db option is when I get issues. After the option, I provide the path to the malaria-db directory ,example tank/dpereus/github_pipeline/malaria-db/db/Plasmodium_falciparum/that was created by malaria profiler update_db.

I get the following error :

ERROR    Can't find database, writing results to file and quitting!                        malaria-profiler:116
           INFO                                                                                               output.py:16
                    Writing outputs                                                                                       
           INFO     Writing json file:                                                                        output.py:21
                    /tank/dpereus/wgs_tz/raw_data/merged_wgs/../malaria-profiler-results/wgs_tz.results.json              
           INFO     Writing text file:                                                                        output.py:24
                    /tank/dpereus/wgs_tz/raw_data/merged_wgs/../malaria-profiler-results/wgs_tz.results.txt               
Traceback (most recent call last):
  File "/home/dpereus/.conda/envs/malaria-profiler/bin/malaria-profiler", line 353, in <module>
    args.func(args)
  File "/home/dpereus/.conda/envs/malaria-profiler/bin/malaria-profiler", line 117, in main_profile
    malp.write_outputs(args,results)
  File "/home/dpereus/.conda/envs/malaria-profiler/lib/python3.9/site-packages/malaria_profiler/output.py", line 25, in write_outputs
    write_text(results,args.conf,text_output,extra_columns)
  File "/home/dpereus/.conda/envs/malaria-profiler/lib/python3.9/site-packages/malaria_profiler/output.py", line 126, in write_text
    return write_species_text(json_results,outfile)
  File "/home/dpereus/.conda/envs/malaria-profiler/lib/python3.9/site-packages/malaria_profiler/output.py", line 190, in write_species_text
    text_strings["species_report"] = pp.dict_list2text(json_results["species"]["prediction"],["species","mean"],{"species":"Species","mean":"Mean kmer coverage"},sep=sep)
TypeError: 'NoneType' object is not subscriptable
Cleaning up after failed run
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/dpereus/.conda/envs/malaria-profiler/bin/malaria-profiler", line 43, in cleanup
    O.write("* Database version: %s\n" % args.conf["version"]["commit"]) if ("conf" in vars(args) and "commit" in args.conf["version"]) else ""
TypeError: 'NoneType' object is not subscriptable

My effort to specify to the specific Pfalciparum directory in the malaria-db yield no fruits. Please help with the correct way of specifying the resistance_db option

jodyphelan commented 7 months ago

Hi @dativapereus,

Thanks for letting us know. Pleae reinstall with the install instructions and let us know if that fixes the issues.

dativapereus commented 7 months ago

Dear Jodyphelan/Malaria-Profiler, Thank you for your response. Dativa Pereus PhD candidate - University of Nairobi BSc: Molecular Biology and Biotechnology MSc: Ethnobotany and Molecular Plant Systematic Assistant research fellow-MUHAS Email: @.*** Mobile Phone No: +255757630732

On Thu, Nov 16, 2023 at 5:37 PM Jody Phelan @.***> wrote:

Hi @dativapereus https://github.com/dativapereus,

Thanks for letting us know. Pleae reinstall with the install instructions and let us know if that fixes the issues.

— Reply to this email directly, view it on GitHub https://github.com/jodyphelan/malaria-profiler/issues/8#issuecomment-1814561608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJZKSLOQB5XPH2FD4VBO6PTYEYQJFAVCNFSM6AAAAAA7MQQDGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJUGU3DCNRQHA . You are receiving this because you were mentioned.Message ID: @.***>

dativapereus commented 7 months ago

@jodyphelan I just finished the installation and reran using the code below:

Pf_WGS_TZ_VCF=/tank/dpereus/wgs_tz/raw_data/merged_wgs
res_db=/tank/dpereus/github_pipeline/malaria-db/db/Plasmodium_falciparum
conda activate malaria-profiler 

malaria-profiler profile --txt --csv --pdf --ram 500 --resistance_db $res_db -v $Pf_WGS_TZ_VCF/wgstz_merged_all_chroms.vcf.gz \
-d $Pf_WGS_TZ_VCF/../malaria-profiler-results \
-p wgs_tz \
-t 80  --ram 500```

The new error I am getting is :

Traceback (most recent call last):
  File "/home/dpereus/.conda/envs/malaria-profiler/bin/malaria-profiler", line 370, in <module>
    args.func(args)
  File "/home/dpereus/.conda/envs/malaria-profiler/bin/malaria-profiler", line 127, in main_profile
    if results["species"]["prediction"] is None:
TypeError: 'NoneType' object is not subscriptable
Cleaning up after failed run
Exception ignored in atexit callback: <function cleanup at 0x7f548fb1f760>
Traceback (most recent call last):
  File "/home/dpereus/.conda/envs/malaria-profiler/bin/malaria-profiler", line 43, in cleanup
    O.write("* Database version: %s\n" % args.conf["version"]["commit"]) if ("conf" in vars(args) and "commit" in args.conf["version"]) else ""
TypeError: 'NoneType' object is not subscriptable
jodyphelan commented 7 months ago

Can I check if you ran malaria-profiler update_db?

If so you should specify just the name of the database not a path (e.g. --resistance_db falciparum)