songweizhi / MetaCHIP

Horizontal gene transfer (HGT) identification pipeline
GNU Affero General Public License v3.0
55 stars 14 forks source link

Unexplainable behaviour of MetaCHIP PI #35

Closed polinanvkv closed 1 year ago

polinanvkv commented 2 years ago

Hello!

I am trying to run MetaCHIP for the first time and getting some issues. I don't yet know how how naive my question is but I do need help :D

So I am starting with: MetaCHIP PI -i genomes_msmithii_gut/ -x fasta -g taxonomy_msmithii_gut_groupping.txt -p gut The output I get:

[2022-06-22 19:31:22] Total number of qualified genomes for HGT detection: 5.
[2022-06-22 19:31:22] Genome ids provided in taxonomy_msmithii_gut_groupping.txt do not match genome files in genomes_msmithii_gut, program exited!
[2022-06-22 19:31:22] Please note that file extension (e.g. fa, fasta) of the input genomes should NOT be included in the grouping file.

The output is weird as I have around 1000 genomes in my genomes_msmithii_gut/ directory, and it said there are only 5 qualified genomes. Then, genome ids in taxonomy_msmithii_gut_groupping.txt are the same as in the folder with the only difference that they have an extension .fasta.

When I run MetaCHIP BP -p gutafterwards, I get:

Traceback (most recent call last):
  File "/home/users/pnovikova/miniconda3/envs/metachip/bin/MetaCHIP", line 169, in <module>
    BP(args, MetaCHIP_config.config_dict)
  File "/home/users/pnovikova/miniconda3/envs/metachip/lib/python3.10/site-packages/MetaCHIP/BP.py", line 1741, in BP
    pwd_prodigal_output_folder_detected    = [os.path.basename(file_name) for file_name in glob.glob(pwd_prodigal_output_folder_re)][0]
IndexError: list index out of range

Below I put heads of my files.

taxonomy_msmithii_gut_groupping.txt

GUT_GENOME001950,s__Methanobrevibacter_A_smithii
GUT_GENOME001966,s__Methanobrevibacter_A_smithii
GUT_GENOME002944,s__Methanobrevibacter_A_smithii
GUT_GENOME003140,s__Methanobrevibacter_A_smithii
GUT_GENOME004154,s__Methanobrevibacter_A_smithii
GUT_GENOME004870,s__Methanobrevibacter_A_smithii
GUT_GENOME005651,s__Methanobrevibacter_A_smithii_A
GUT_GENOME005889,s__Methanobrevibacter_A_smithii
GUT_GENOME006755,s__Methanobrevibacter_A_smithii
GUT_GENOME008460,s__Methanobrevibacter_A_smithii

files in genomes_msmithii_gut/:

-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.7M Jun 22 14:41 GUT_GENOME001950.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.7M Jun 22 14:41 GUT_GENOME001966.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.2M Jun 22 14:41 GUT_GENOME002944.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.6M Jun 22 14:41 GUT_GENOME003140.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.2M Jun 22 14:41 GUT_GENOME004154.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.4M Jun 22 14:41 GUT_GENOME004870.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  2.1M Jun 22 14:41 GUT_GENOME005651.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.7M Jun 22 14:41 GUT_GENOME005889.fasta
-rwxr-xr-x 1 pnovikova archaea_neurodeg  1.5M Jun 22 14:41 GUT_GENOME006755.fasta

Is there any explanation to that and what could be done to fix? It feels like there's something wrong with the file format or so, but I cannot see it, in any case I am trying to follow the examples.

The version of MetaCHIP is v1.10.9.

songweizhi commented 2 years ago

hi polinanvkv, Have you figured out why MetaCHIp only found 5 of your 1000 genomes? I didn't spot anything wrong with your input files. Weizhi