jodyphelan / TBProfiler

Profiling tool for Mycobacterium tuberculosis to detect ressistance and strain type from WGS data
GNU General Public License v3.0
102 stars 42 forks source link

TypeError: Parallel.__init__() got an unexpected keyword argument 'return_as' #309

Closed ponomarevsy closed 8 months ago

ponomarevsy commented 8 months ago

Dear TBProfiler experts,

I am getting this error in v5 of TBProfiler (tried installing via Anaconda and via Mamba and getting the same error):

"TypeError: Parallel.init() got an unexpected keyword argument 'return_as'"

The OS is RHEL 7.9. Please let me know if you have any clues. Thanks!

user@node:/data/TBProfiler$ module load mamba; source activate tbprofiler-5.0.0-python3

(tbprofiler-5.0.0-python3) user@node:/data/TBProfiler$ tb-profiler profile -1 SRR1158874_1.fastq.gz -2 SRR1158874_2.fastq.gz -t 4 -p SRR1158874 --txt

[11:21:20] INFO     Using ref file: /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/share/tbprofiler//tbdb.fasta  db.py:795
           INFO     Using gff file: /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/share/tbprofiler//tbdb.gff    db.py:795
           INFO     Using bed file: /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/share/tbprofiler//tbdb.bed    db.py:795
           INFO     Using version file:                                                                                                      db.py:795
                    /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/share/tbprofiler//tbdb.version.json
           INFO     Using json_db file:                                                                                                      db.py:795
                    /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/share/tbprofiler//tbdb.dr.json
           INFO     Using variables file:                                                                                                    db.py:795
                    /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/share/tbprofiler//tbdb.variables.json
           INFO     Using spoligotype_spacers file:                                                                                          db.py:795
                    /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/share/tbprofiler//tbdb.spoligotype_spacers.tx
                    t
           INFO     Using spoligotype_annotations file:                                                                                      db.py:795
                    /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/share/tbprofiler//tbdb.spoligotype_list.csv
           INFO     Using bedmask file:                                                                                                      db.py:795
                    /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/share/tbprofiler//tbdb.mask.bed
           INFO     Using barcode file:                                                                                                      db.py:795
                    /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/share/tbprofiler//tbdb.barcode.bed
           INFO     Trimming reads                                                                                                         fastq.py:38
[11:21:31] INFO     Mapping to reference genome                                                                                            fastq.py:51
[11:23:58] WARNING  Please ensure that this BAM was made using the same reference as in the database. If you are not sure what          profiler.py:13
                    reference was used it is best to remap the reads.
           INFO     Running variant calling                                                                                                  bam.py:85
Traceback (most recent call last):
  File "/pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/bin/tb-profiler", line 559, in <module>
    args.func(args)
  File "/pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/bin/tb-profiler", line 110, in main_profile
    results.update(pp.run_profiler(args))
  File "/pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/lib/python3.10/site-packages/pathogenprofiler/cli.py", line 52, in run_profiler
    results = bam_profiler(
  File "/pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/lib/python3.10/site-packages/pathogenprofiler/profiler.py", line 31, in bam_profiler
    vcf_obj = bam.call_variants(conf["ref"], caller=caller, filters = conf['variant_filters'], bed_file=conf["bed"], threads=threads, calling_params=calling_params, samclip = samclip)
  File "/pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/lib/python3.10/site-packages/pathogenprofiler/bam.py", line 86, in call_variants
    run_cmd_parallel_on_genome(self.calling_cmd,ref_file,bed_file = bed_file,threads=threads,desc="Calling variants")
  File "/pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/lib/python3.10/site-packages/pathogenprofiler/utils.py", line 70, in run_cmd_parallel_on_genome
    parallel = Parallel(n_jobs=threads, return_as="generator")
TypeError: Parallel.__init__() got an unexpected keyword argument 'return_as'
Cleaning up after failed run
           ERROR                                                                                                                        tb-profiler:58

                    ################################# ERROR #######################################

                    This run has failed. Please check all arguments and make sure all input files
                    exist. If no solution is found, please open up an issue at
                    https://github.com/jodyphelan/TBProfiler/issues/new and paste or attach the
                    contents of the error log (SRR1158874.errlog.txt)

                    ###############################################################################
ponomarevsy commented 8 months ago

I used these commands to create TBProfiler environment:

[user@node]» module load mamba/2022.11                                                                                                   
[user@node]» mamba create -n tbprofiler-5.0.0-python3 -c bioconda tb-profiler
jodyphelan commented 8 months ago

Hi @ponomarevsy

This may indicate you are using an old version of the joblib library. Can you verify that you have a version >= 1.3.0? You can use mamba list --explicit | grep joblib to check. If it is low try upgrading with mamba install joblib=1.3.0

ponomarevsy commented 8 months ago

Thank you, @jodyphelan! You are correct, I have an older version of joblib:

(tbprofiler-5.0.0-python3) [user@node](:|✔)» mamba list --explicit | grep joblib                                                         
https://repo.anaconda.com/pkgs/main/linux-64/joblib-1.2.0-py310h06a4308_0.conda

(tbprofiler-5.0.0-python3) [user@node](:|✔)» mamba list joblib                                                                           
# packages in environment at /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3:
#
# Name                    Version                   Build  Channel
joblib                    1.2.0           py310h06a4308_0

I am going to upgrade joblib and re-run the test...

ponomarevsy commented 8 months ago

I am getting a bcftools related error now:

bcftools: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory

The bcftools version is:

bcftools     1.18  h8b25389_0  bioconda

and the openssl version is:

openssl                   3.1.3                hd590300_0    conda-forge

I found this post: https://github.com/WGLab/NanoCaller/issues/29 and will see if I can downgrade bcftools...

ponomarevsy commented 8 months ago

Looks like bcftools 1.14 comes with gsl 2.6 (I had bcftools 1.18 and gsl 2.7.1) and that's what we want. And now I am getting a new (Samtools) error:

samtools: /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/bin/../lib/libtinfow.so.6: no version information 
available (required by samtools)
samtools: /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/bin/../lib/libncursesw.so.6: no version information available (required by samtools)
samtools: /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/bin/../lib/libncursesw.so.6: no version information available (required by samtools)
Failed to read from standard input: unknown file type
Failed to read from standard input: unknown file type

Samtools version:

samtools                  1.18                 h50ea8bc_1    bioconda

Ncurses version:

ncurses                   6.4                  h6a678d5_0
ponomarevsy commented 8 months ago

Downgraded Samtools to samtools 1.14 hb421002_0 bioconda and Ncurses to ncurses 6.2 he6710b0_1.

It runs fine until it crashes with:

[10:59:55] WARNING  Please ensure that this BAM was made using the same reference as in the database. If you are not sure what          profiler.py:13
                    reference was used it is best to remap the reads.
           INFO     Running variant calling                                                                                                  bam.py:85
Calling variants:   0%|                                                                                                        | 0/58 [00:00<?, ?it/s]ERROR:root:[samclip] samclip 0.4.0 by Torsten Seemann (@torstenseemann)
[samclip] Loading: /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/share/tbprofiler//tbdb.fasta.fai
[samclip] Found 1 sequences in /pathto/mambaforge/2022.11/envs/tbprofiler-5.0.0-python3/share/tbprofiler//tbdb.fasta.fai
[samclip] Total SAM records 1366, removed 222, allowed 39, passed 1144
[samclip] Header contained 20 lines
[samclip] Done.
Failed to read from standard input: unknown file type
Failed to read from standard input: unknown file type
ponomarevsy commented 8 months ago

I've decided to start from scratch and build everything manually, and, after much suffering (installing the missing dependencies), was able to finish the TBProfile test without errors.