songweizhi / MetaCHIP

Horizontal gene transfer (HGT) identification pipeline
GNU Affero General Public License v3.0
55 stars 14 forks source link

BLAST and identity distribution error #6

Closed susheelbhanu closed 4 years ago

susheelbhanu commented 4 years ago

@songweizhi

I ran metaCHIP successfully in the past, but when I tried it again today, ran into the following issues:

BLAST Database error: No alias or index file found for nucleotide database [cont-1_MetaCHIP_wd/cont-1_all_blastdb/cont-1_all_combined_ffn.fasta] in search path [/mnt/lscratch/users/ldenies/mice_AMR/hgt::]
[2020-06-04 07:59:51] Deleting temporary files
[2020-06-04 07:59:51] PrepIn done!
[2020-06-04 07:59:52] Found grouping file cont-1_g33_grouping.txt, input genomes were clustered into 33 groups
[2020-06-04 07:59:52] Filtering blast matches with the following criteria: Query genome != Subject genome, Alignment length >= 200bp and coverage >= 75%
[2020-06-04 07:59:53] Combining filtered blastn results
[2020-06-04 07:59:53] Get group-to-group identities with 18 cores
[2020-06-04 07:59:53] Plotting identity distribution between each pair of groups
Traceback (most recent call last):
  File "/home/users/sbusi/.local/bin/MetaCHIP", line 227, in <module>
    BM(args, config_dict)
  File "/home/users/sbusi/.local/lib/python3.6/site-packages/MetaCHIP/BP.py", line 2220, in BM
    do(plot_identity)
  File "/home/users/sbusi/.local/lib/python3.6/site-packages/MetaCHIP/BP.py", line 1934, in do
    current_group_pair_identity_cut_off = np.percentile(current_group_pair_identities_array, identity_percentile)
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/site-packages/numpy-1.14.0-py3.6-linux-x86_64.egg/numpy/lib/function_base.py", line 4291, in percentile
    interpolation=interpolation)
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/site-packages/numpy-1.14.0-py3.6-linux-x86_64.egg/numpy/lib/function_base.py", line 4033, in _ureduce
    r = func(a, **kwargs)
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/site-packages/numpy-1.14.0-py3.6-linux-x86_64.egg/numpy/lib/function_base.py", line 4405, in _percentile
    x1 = take(ap, indices_below, axis=axis) * weights_below
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/site-packages/numpy-1.14.0-py3.6-linux-x86_64.egg/numpy/core/fromnumeric.py", line 159, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/site-packages/numpy-1.14.0-py3.6-linux-x86_64.egg/numpy/core/fromnumeric.py", line 52, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.

is the blast index related to the plotting error? is there a workaround?

Thank you!

songweizhi commented 4 years ago

Hi Susheel, will look into this issue. Meanwhile, can you please remove "plot_iden" from our command and try again? Cheers,

susheelbhanu commented 4 years ago

Thank you. I commented out the plot_iden part from the script. Will let you know if that helps it!

susheelbhanu commented 4 years ago

got this..

Traceback (most recent call last):
  File "/home/users/sbusi/apps/miniconda3/bin/MetaCHIP", line 244, in <module>
    BM(args, config_dict)
  File "/home/users/sbusi/apps/miniconda3/lib/python3.7/site-packages/MetaCHIP/BP.py", line 1971, in BM
    plot_identity =             args['plot_iden']
KeyError: 'plot_iden'
susheelbhanu commented 4 years ago

@songweizhi I commented out the same form the BP.py script but looks like it's linked to other issues

[sbusi@iris-187 hgt]$ MetaCHIP BP -p tx-2 -r g -t 24 -force
[2020-06-04 09:24:02] Found grouping file tx-2_g26_grouping.txt, input genomes were clustered into 26 groups
[2020-06-04 09:24:02] Filtered blastn results at specified taxonomic rank detected from folder tx-2_g26_blastn_results_filtered. HGT analysis will be performed based on these files.
[2020-06-04 09:24:02] Combining filtered blastn results
[2020-06-04 09:24:02] Get group-to-group identities with 24 cores
[2020-06-04 09:24:02] Plotting identity distribution between each pair of groups
[2020-06-04 09:24:02] Analyzing Blast hits to get HGT candidates with 24 cores
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/users/sbusi/.local/lib/python3.6/site-packages/MetaCHIP/BP.py", line 1256, in get_HGT_worker
    group_pair_iden_cutoff_dict)
  File "/home/users/sbusi/.local/lib/python3.6/site-packages/MetaCHIP/BP.py", line 387, in get_candidates
    query_gene_name = query_split[1]
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/users/sbusi/.local/bin/MetaCHIP", line 227, in <module>
    BM(args, config_dict)
  File "/home/users/sbusi/.local/lib/python3.6/site-packages/MetaCHIP/BP.py", line 2253, in BM
    pool.map(get_HGT_worker, list_for_multiple_arguments_get_HGT)
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/apps/resif/data/production/v1.1-20180716/default/software/lang/Python/3.6.4-intel-2018a/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
IndexError: list index out of range
songweizhi commented 4 years ago

It works fine on my side, what's the version of your installation. Can you please try again with the latest version (1.9.0)?

susheelbhanu commented 4 years ago

I'm using 1.9.0 as well.. Please see below:

 ~/apps/metachip/MetaCHIP

            ...::: MetaCHIP v1.9.0 :::...

    Core modules:
       PI             ->    Prepare input files
       BP             ->    Run Best-match and Phylogenetic approaches

    Supplementary modules:
       CMLP           ->    Combine multi-level predictions (part of BP module)
       filter_HGT     ->    Get HGTs predicted at least n levels (for multi-level prediction)
       update_hmms    ->    update hmm profiles used for inferring SCG tree
       get_SCG_tree   ->    Get SCG protein tree
       SankeyTaxon    ->    Visualize taxonomic classification with Sankey plot
       circos_HGT     ->    Visualize gene flow with circos plot
       rename_seqs    ->    Rename sequences in a file

    # for command specific help
    MetaCHIP PI -h
    MetaCHIP BP -h
~/apps/metachip/MetaCHIP BP -p tx-2 -r g -t 24 -force
[2020-06-04 11:15:04] Found grouping file tx-2_g26_grouping.txt, input genomes were clustered into 26 groups
[2020-06-04 11:15:04] Filtered blastn results at specified taxonomic rank detected from folder tx-2_g26_blastn_results_filtered. HGT analysis will be performed based on these files.
[2020-06-04 11:15:04] Combining filtered blastn results
[2020-06-04 11:15:04] Get group-to-group identities with 24 cores
[2020-06-04 11:15:04] Plotting identity distribution between each pair of groups
Traceback (most recent call last):
  File "/home/users/sbusi/apps/metachip/MetaCHIP", line 244, in <module>
    BM(args, config_dict)
  File "/home/users/sbusi/apps/miniconda3/envs/hgtector/lib/python3.8/site-packages/MetaCHIP/BP.py", line 2220, in BM
    do(plot_identity)
  File "/home/users/sbusi/apps/miniconda3/envs/hgtector/lib/python3.8/site-packages/MetaCHIP/BP.py", line 1934, in do
    current_group_pair_identity_cut_off = np.percentile(current_group_pair_identities_array, identity_percentile)
  File "<__array_function__ internals>", line 5, in percentile
  File "/home/users/sbusi/apps/miniconda3/envs/hgtector/lib/python3.8/site-packages/numpy/lib/function_base.py", line 3705, in percentile
    return _quantile_unchecked(
  File "/home/users/sbusi/apps/miniconda3/envs/hgtector/lib/python3.8/site-packages/numpy/lib/function_base.py", line 3824, in _quantile_unchecked
    r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
  File "/home/users/sbusi/apps/miniconda3/envs/hgtector/lib/python3.8/site-packages/numpy/lib/function_base.py", line 3403, in _ureduce
    r = func(a, **kwargs)
  File "/home/users/sbusi/apps/miniconda3/envs/hgtector/lib/python3.8/site-packages/numpy/lib/function_base.py", line 3941, in _quantile_ureduce_func
    x1 = take(ap, indices_below, axis=axis) * weights_below
  File "<__array_function__ internals>", line 5, in take
  File "/home/users/sbusi/apps/miniconda3/envs/hgtector/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 194, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File "/home/users/sbusi/apps/miniconda3/envs/hgtector/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 61, in _wrapfunc
    return bound(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.
songweizhi commented 4 years ago

could you share with me your commands and some of your input files, hope I can reproduce the error. you can send me the google drive or dropbox link to songwz03@gmail.com Thanks for reporting this error :)

susheelbhanu commented 4 years ago

Thank you. Here are my commands:

MetaCHIP PI -p tx-2 -r g -t 24 -i cont-2 -x fa -taxon tx-2/tx-2_gtdbtk.tsv
MetaCHIP BP -p tx-2 -r g -t 24 -force

And the folder with the files I just sent you a link. let me know if you have trouble accessing them.

susheelbhanu commented 4 years ago

@songweizhi I did a clean re-install using the attached yaml file.

I went past the previous error, but now a different error. See attached log file.

slurm-1841762.out.txt metachip_new.yaml.txt

susheelbhanu commented 4 years ago

@songweizhi UPDATE: I used a new YAML file but still get an error. You can find the files here: https://drive.google.com/drive/folders/1MqqorrBTjouxJ1YTWT-S-ROGcOfpOizw?usp=sharing

The error has to do with not being able to find files. Which I checked and they don't exist.

[2020-06-05 12:48:28] PrepIn done!
[2020-06-05 12:48:29] Found grouping file tx-2_g26_grouping.txt, input genomes were clustered into 26 groups
[2020-06-05 12:48:29] Filtering blast matches with the following criteria: Query genome != Subject genome, Alignment length >= 200bp and coverage >= 75%
[2020-06-05 12:48:30] Combining filtered blastn results
[2020-06-05 12:48:30] Get group-to-group identities with 18 cores
[2020-06-05 12:48:30] Plotting identity distribution between each pair of groups
[2020-06-05 12:48:30] Analyzing Blast hits to get HGT candidates with 18 cores
[2020-06-05 12:48:31] Plotting flanking regions with 18 cores
Command line argument error: Argument "subject". File is not accessible:  `tx-2_MetaCHIP_wd/tx-2_g26_HGTs_ip90_al200bp_c75_ei80_f10kbp/tx-2_g26_Flanking_region_plots/S6_MG_Paul_S5_G23_sub.contigs_00494___S6_MG_Paul_S5_maxbin_res.011.fasta_sub.contigs_00895/S6_MG_Paul_S5_maxbin_res.011.fasta_sub.contigs_00895_10000bp.fasta'
Command line argument error: Argument "subject". File is not accessible:  `tx-2_MetaCHIP_wd/tx-2_g26_HGTs_ip90_al200bp_c75_ei80_f10kbp/tx-2_g26_Flanking_region_plots/S6_MG_Paul_S5_G23_sub.contigs_01650___S6_MG_Paul_S5_maxbin_res.011.fasta_sub.contigs_00895/S6_MG_Paul_S5_maxbin_res.011.fasta_sub.contigs_00895_10000bp.fasta'
Command line argument error: Argument "subject". File is not accessible:  `tx-2_MetaCHIP_wd/tx-2_g26_HGTs_ip90_al200bp_c75_ei80_f10kbp/tx-2_g26_Flanking_region_plots/S6_MG_Paul_S5_G20.contigs_00776___S6_MG_Paul_S5_maxbin_res.008.fasta_sub.contigs_00148/S6_MG_Paul_S5_maxbin_res.008.fasta_sub.contigs_00148_10000bp.fasta'
Command line argument error: Argument "subject". File is not accessible:  `tx-2_MetaCHIP_wd/tx-2_g26_HGTs_ip90_al200bp_c75_ei80_f10kbp/tx-2_g26_Flanking_region_plots/S6_MG_Paul_S5_G23_sub.contigs_01451___S6_MG_Paul_S5_maxbin_res.008.fasta_sub.contigs_01906/S6_MG_Paul_S5_maxbin_res.008.fasta_sub.contigs_01906_10000bp.fasta'
Command line argument error: Argument "subject". File is not accessible:  `tx-2_MetaCHIP_wd/tx-2_g26_HGTs_ip90_al200bp_c75_ei80_f10kbp/tx-2_g26_Flanking_region_plots/S6_MG_Paul_S5_G4.1.contigs_00956___S6_MG_Paul_S5_maxbin_res.008.fasta_sub.contigs_02021/S6_MG_Paul_S5_maxbin_res.008.fasta_sub.contigs_02021_10000bp.fasta'
Command line argument error: Argument "subject". File is not accessible:  `tx-2_MetaCHIP_wd/tx-2_g26_HGTs_ip90_al200bp_c75_ei80_f10kbp/tx-2_g26_Flanking_region_plots/S6_MG_Paul_S5_G23_sub.contigs_00494___S6_MG_Paul_S5_maxbin_res.011.fasta_sub.contigs_00895/S6_MG_Paul_S5_maxbin_res.011.fasta_sub.contigs_00895.fasta'
Command line argument error: Argument "subject". File is not accessible:  `tx-2_MetaCHIP_wd/tx-2_g26_HGTs_ip90_al200bp_c75_ei80_f10kbp/tx-2_g26_Flanking_region_plots/S6_MG_Paul_S5_G23_sub.contigs_00236___S6_MG_Paul_S5_maxbin_res.008.fasta_sub.contigs_00331/S6_MG_Paul_S5_maxbin_res.008.fasta_sub.contigs_00331_10000bp.fasta'
Command line argument error: Argument "subject". File is not accessible:  `tx-2_MetaCHIP_wd/tx-2_g26_HGTs_ip90_al200bp_c75_ei80_f10kbp/tx-2_g26_Flanking_region_plots/S6_MG_Paul_S5_G23_sub.contigs_01650___S6_MG_Paul_S5_maxbin_res.011.fasta_sub.contigs_00895/S6_MG_Paul_S5_maxbin_res.011.fasta_sub.contigs_00895.fasta'
susheelbhanu commented 4 years ago

@songweizhi It WORKS.. (for the most part).. the issue was the file names. I replaced all the file names and made them very small, i.e. something like test_1.fa. It's not just the contig_names but the actual filenames that were causing an issue.

It ran past the previous error, but now there is a RANGER-DTL2 error. See below (and attached file):

2020-06-05 18:26:11] Plotting flanking regions with 18 cores
[2020-06-05 18:29:54] Extracting nc sequences for BM predicted HGTs
[2020-06-05 18:29:55] Deleting temporary files
[2020-06-05 18:29:55] Done for Best-match approach!
[2020-06-05 18:29:55] Found grouping file tx-2_g26_grouping.txt, input genomes were clustered into 26 groups
[2020-06-05 18:29:55] Get gene/genome member in gene/species tree for each BM predicted HGT
[2020-06-05 18:29:55] Prepare subset of tx-2_all_combined_faa.fasta for building gene tree
[2020-06-05 18:29:56] Get species/gene tree for 571 BM approach identified HGTs with 18 cores
[2020-06-05 18:30:31] Running Ranger-DTL2 with dated mode
ERROR: missing ')' in input tree expression line 2 column 35
ERROR: missing ')' in input tree expression line 2 column 15
ERROR: missing ')' in input tree expression line 2 column 35
ERROR: missing ')' in input tree expression line 2 column 15
ERROR: missing ')' in input tree expression line 2 column 14
ERROR: missing ')' in input tree expression line 2 column 14
ERROR: missing ')' in input tree expression line 2 column 14
ERROR: missing ')' in input tree expression line 2 column 15
ERROR: missing ')' in input tree expression line 2 column 34
ERROR: missing ')' in input tree expression line 2 column 14
ERROR: missing ')' in input tree expression line 2 column 88
ERROR: missing ')' in input tree expression line 2 column 14
ERROR: missing ')' in input tree expression line 2 column 14
ERROR: missing ')' in input tree expression line 2 column 59
ERROR: missing ')' in input tree expression line 2 column 15
ERROR: missing ')' in input tree expression line 2 column 56
ERROR: missing ')' in input tree expression line 2 column 15
ERROR: missing ')' in input tree expression line 2 column 34
ERROR: missing ')' in input tree expression line 2 column 34
ERROR: missing ')' in input tree expression line 2 column 15
ERROR: missing ')' in input tree expression line 2 column 57
ERROR: missing ')' in input tree expression line 2 column 57
ERROR: missing ')' in input tree expression line 2 column 59
ERROR: missing ')' in input tree expression line 2 column 14
ERROR: missing ')' in input tree expression line 2 column 67
ERROR: missing ')' in input tree expression line 2 column 67
ERROR: missing ')' in input tree expression line 2 column 15
ERROR: missing ')' in input tree expression line 2 column 34
ERROR: missing ')' in input tree expression line 2 column 15
[2020-06-05 18:30:33] Parsing Ranger prediction results
[2020-06-05 18:30:33] Add Ranger-DTL predicted direction to HGT_candidates.txt
[2020-06-05 18:30:33] Deleting temporary files
[2020-06-05 18:30:38] Done for Phylogenetic approach!
== Ending run at Fri Jun  5 18:30:44 CEST 2020

slurm-1843189.out.txt

susheelbhanu commented 4 years ago

@songweizhi With the new updated v.1.9.1 MetaCHIP, there are no more errors. Here's the output for the BP module alone.

(METACHIP) [sbusi@iris-001 metachip]$ MetaCHIP BP -p tx-3 -r g -t 24 -force -tmp
[2020-06-07 17:12:30] Found grouping file tx-3_g27_grouping.txt, input genomes were clustered into 27 groups
[2020-06-07 17:12:33] Filtered blastn results at specified taxonomic rank detected from folder tx-3_g27_blastn_results_filtered. HGT analysis will be performed based on these files.
[2020-06-07 17:12:33] Combining filtered blastn results
[2020-06-07 17:12:35] Get group-to-group identities with 24 cores
[2020-06-07 17:12:36] Plotting identity distribution between each pair of groups
[2020-06-07 17:12:37] Analyzing Blast hits to get HGT candidates with 24 cores
[2020-06-07 17:12:39] Plotting flanking regions with 24 cores
[2020-06-07 17:20:44] Extracting nc sequences for BM predicted HGTs
[2020-06-07 17:20:46] Done for Best-match approach!
[2020-06-07 17:20:46] Found grouping file tx-3_g27_grouping.txt, input genomes were clustered into 27 groups
[2020-06-07 17:20:46] Get gene/genome member in gene/species tree for each BM predicted HGT
[2020-06-07 17:20:46] Prepare subset of tx-3_all_combined_faa.fasta for building gene tree
[2020-06-07 17:20:48] Get species/gene tree for 781 BM approach identified HGTs with 24 cores
[2020-06-07 17:24:47] Running Ranger-DTL2 with dated mode
[2020-06-07 17:24:55] Parsing Ranger prediction results
[2020-06-07 17:24:55] Add Ranger-DTL predicted direction to HGT_candidates.txt
[2020-06-07 17:24:55] Deleting temporary files
[2020-06-07 17:24:55] Done for Phylogenetic approach!
(METACHIP) [sbusi@iris-001 metachip]$

Thanks for all the help!