merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
443 stars 145 forks source link

Fix use of stray_ko_dict when including stray KOs in anvi-estimate-metabolism #2336

Closed bcoltman closed 2 months ago

bcoltman commented 2 months ago

Revert to seperating the stray_ko_dict as a separate entity. Fixes issues when using:

anvi-estimate-metabolism --include-stray-KOs

Which would lead to the following errors: Traceback (most recent call last): File github/anvio/bin/anvi-estimate-metabolism, line 148, in main(args) File github/anvio/anvio/terminal.py, line 915, in wrapper program_method(*args, **kwargs) File github/anvio/bin/anvi-estimate-metabolism, line 39, in main m.estimate_metabolism() File github/anvio/anvio/kegg.py, line 6014, in estimate_metabolism kegg_metabolism_superdict, kofam_hits_superdict = self.estimate_for_genome(kofam_hits_info) File github/anvio/anvio/kegg.py, line 5516, in estimate_for_genome self.append_kegg_metabolism_superdicts(genome_metabolism_superdict, genome_ko_superdict) File github/anvio/anvio/kegg.py, line 6557, in append_kegg_metabolism_superdicts output_dict = self.generate_output_dict_for_kofams(ko_superdict_for_list_of_splits, headers_to_include=header_list) File github/anvio/anvio/kegg.py, line 6416, in generate_output_dict_for_kofams metadata_dict = self.get_ko_metadata_dictionary(ko, dont_fail_if_not_found=True) File github/anvio/anvio/kegg.py, line 3456, in get_ko_metadata_dictionary elif self.include_stray_kos and self.stray_ko_dict and knum in self.stray_ko_dict: AttributeError: 'KeggMetabolismEstimator' object has no attribute 'stray_ko_dict'. Did you mean: 'setup_ko_dict'?

ivagljiva commented 2 months ago

Hey @bcoltman , could you please send more information about when this behavior fails? For instance, when I run anvi-estimate-metabolism -c CONTIGS.db --include-stray-KOs, it works just fine.

bcoltman commented 2 months ago

Hey @ivagljiva, sorry for not including enough information. I was executing the script like this:

anvi-estimate-metabolism -c CONTIGS.db --include-stray-KOs -p PROFILE.db --kegg-data-dir $CUSTOM_LOC/kegg/ --add-coverage --output-modes modules,hits --output-file-prefix $VARIABLE

My kegg-data-dir is setup from this snapshot: v2024-03-09: url: https://figshare.com/ndownloader/files/44953354 archive_name: KEGG_build_2024-03-09_23910d68b4f2.tar.gz hash: 23910d68b4f2 modules_db_version: 4 no_modeling_data: True no_binary_relations: True no_maps: True

Prior to the change, it would return the above mentioned error. Switching it to False, which is the default for the function, Fixed it for me.

ivagljiva commented 2 months ago

Ah, thanks for the info @bcoltman. I was able to replicate the error now. I had intended anvi-estimate-metabolism to be compatible with using either add_entries_to_regular_ko_dict=True or False, but it turns out that not initializing that attribute even when it is True brought up issues downstream, specifically for the get_ko_metadata_dictionary function that is used for --output-modes hits. Thanks for catching this.

I've added a commit to fix this behavior, as well as another commit to fix a different place that I found was dependent on having add_entries_to_regular_ko_dict=True. I think the program should work now regardless of the parameter setting for add_entries_to_regular_ko_dict. 🤞

I hope I've caught everything. I'll merge this PR now and we'll see if anything else breaks :)

Thanks for your help! It would be wonderful if you could add yourself into our list of developers. That way you can be featured on the anvi'o website (if you want), which is a nice way to get credit for your contributions. Instructions here.

bcoltman commented 2 months ago

Thanks for such a rapid response and great support @ivagljiva!