merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
443 stars 145 forks source link

[BUG] Pandas dataframe has no attribute 'append' #2313

Open philipwoods opened 3 months ago

philipwoods commented 3 months ago

Short description of the problem

anvi-analyze-synteny fails because Pandas has deprecated DataFrame.append() as of version 1.4.0 in favor of pandas.concat().

anvi'o version

Anvi'o .......................................: marie (v8)
Python .......................................: 3.10.13

Profile database .............................: 38
Contigs database .............................: 21
Pan database .................................: 16
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

Operating system is RedHat enterprise Linux. Anvi'o was installed in a conda environment.

Detailed description of the issue

I ran anvi-analyze-synteny on my pangenome and got the following error:

Traceback (most recent call last):
  File "/export/data1/sw/anaconda3-2019.07/envs/anvio-8/bin/anvi-analyze-synteny", line 75, in <module>
    ngram.report_ngrams_to_user()
  File "/export/data1/sw/anaconda3-2019.07/envs/anvio-8/lib/python3.10/site-packages/anvio/synteny.py", line 421, in report_ngrams_to_user
    df = self.convert_to_df()
  File "/export/data1/sw/anaconda3-2019.07/envs/anvio-8/lib/python3.10/site-packages/anvio/synteny.py", line 384, in convert_to_df
    df = df.append({'ngram': ngram,
  File "/export/data1/sw/anaconda3-2019.07/envs/anvio-8/lib/python3.10/site-packages/pandas/core/generic.py", line 6296, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'?

Looking into it, I found that requirements.txt forces pandas==1.4.4, while DataFrame.append() has been deprecated since pandas version 1.4.0. Therefore I expect that this will be an issue in every part of anvi'o that currently uses the pandas DataFrame.append() method.

mschecht commented 3 months ago

@philipwoods, thanks for posting this bug!

Weirdly, I was not able to reproduce it on my end but I went ahead and refactored self.convert_to_df() to useDataFrame.concat() which should fix this issue. If possible, could you post a tar gzipped directory of the pangenome and command you used? I want to reproduce it before I commit.

mschecht commented 3 months ago

Here is the branch tracking this issue: https://github.com/merenlab/anvio/compare/master...deprecate-pandas-append-synteny

philipwoods commented 3 months ago

Sorry for the delay! Here is the file and the command I used (I forget whether --annotation-source is necessary when using gene clusters as the ngram source, but if it is you can use --annotation-source COG20_FUNCTION). pangenome.tar.gz anvi-analyze-synteny --analyze-unknown-functions -n gene_clusters --ngram-window-range 3:15 -g ANME3EVO-revision-GENOMES.db -p pangenome/ANME3EVO-revision-PAN.db

meren commented 3 months ago

I run your command in @mschecht's branch, and got this error:

Functions found ..............................: EGGNOG_BEST_TAX, Pfam, COG20_CATEGORY, EGGNOG_BACT, COG20_FUNCTION, EGGNOG_PFAMs, EGGNOG_COG_CATEGORY, EGGNOG_BRITE, KEGG_BRITE, EGGNOG_KEGG_KO, KOfam, EGGNOG_GENE_FUNCTION_NAME,
                                                EGGNOG_KEGG_REACTION, EGGNOG_BiGG_REACTIONS, EGGNOG_KEGG_MODULE, EGGNOG_KEGG_PATHWAYS, EGGNOG_KEGG_TC, KEGG_Class, EGGNOG_EC_NUMBER, KEGG_Module, COG20_PATHWAY, EGGNOG_KEGG_RCLASS,
                                                EGGNOG_CAZy, EGGNOG_GO_TERMS
Genomes storage ..............................: Initialized (storage hash: hash45b805d1)
Num genomes in storage .......................: 67
Num genomes will be used .....................: 67

WARNING
===============================================
Anvi'o is now looking for Ngrams in your contigs!

* What do we say to loci that appear to have no coherent synteny patterns...? Not
  today! ⚔️

Traceback (most recent call last):
  File "/Users/meren/github/anvio/bin/anvi-analyze-synteny", line 74, in <module>
    ngram.report_ngrams_to_user()
  File "/Users/meren/github/anvio/anvio/synteny.py", line 420, in report_ngrams_to_user
    df = self.convert_to_df()
  File "/Users/meren/github/anvio/anvio/synteny.py", line 408, in convert_to_df
    ngram_count_df_final = pd.concat(ngram_count_df_list, ignore_index=True)
  File "/Users/meren/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/Users/meren/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 347, in concat
    op = _Concatenator(
  File "/Users/meren/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 404, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

So it is some improvement, but more things to fix clearly :)