nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

On funannotate annotate: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2032' in position 1: ordinal not in range(128) #384

Closed atiweb closed 4 years ago

atiweb commented 4 years ago

Are you using the latest release? Yes,

Describe the bug On funannotate compare OS: linux2, 24 cores, ~ 132 GB RAM. Python: 2.7.17 [12:17 AM]: Running 1.7.3 [12:17 AM]: Now parsing 6 genomes [12:17 AM]: working on Amauroascus_mutatus_UAMH_3576 UAMH_3576 [12:18 AM]: working on Amauroascus_niger_UAMH_3544 UAMH_3544 [12:18 AM]: working on Byssoonygena_ceratinophila_UAMH_5669 UAMH_5669 [12:19 AM]: working on Chrysosporium_queenslandicum_CBS_280_77 CBS_280_77 [12:19 AM]: working on Emergomyces_orientalis_5z489 5z489 [12:20 AM]: working on Ophidiomyces_ophiodiicola_MYCO-ARIZ_AN0400001 MYCO-ARIZ_AN0400001 [12:20 AM]: Summarizing secondary metabolism gene clusters [12:20 AM]: Summarizing PFAM domain results Traceback (most recent call last): File "/usr/local/bin/funannotate", line 660, in main() File "/usr/local/bin/funannotate", line 650, in main mod.main(arguments) File "/usr/local/lib/python2.7/dist-packages/funannotate/compare.py", line 376, in main pfamdf2.to_csv(os.path.join(args.out, 'pfam', 'pfam.results.csv')) File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3020, in to_csv formatter.save() File "/usr/local/lib/python2.7/dist-packages/pandas/io/formats/csvs.py", line 172, in save self._save() File "/usr/local/lib/python2.7/dist-packages/pandas/io/formats/csvs.py", line 288, in _save self._save_chunk(start_i, end_i) File "/usr/local/lib/python2.7/dist-packages/pandas/io/formats/csvs.py", line 315, in _save_chunk self.cols, self.writer) File "pandas/_libs/writers.pyx", line 55, in pandas._libs.writers.write_csv_rows UnicodeEncodeError: 'ascii' codec can't encode character u'\u2032' in position 1: ordinal not in range(128)

What command did you issue? funannotate compare --input Amauroascus_mutatus/UAMH_3576/fun_out/annotate_results Amauroascus_niger/UAMH_3544/fun_out/annotate_results /Byssoonygena_ceratinophila/UAMH_5669/fun_out/annotate_results Chrysosporium_queenslandicum/CBS_280_77/fun_out/annotate_results Emergomyces_orientalis/5z489/fun_out/annotate_results Ophidiomyces_ophiodiicola/MYCO-ARIZ_AN0400001/fun_out/annotate_results --out /funannotate_compare_less --cpus 20 --run_dnds estimate

OS/Install Information

Checking dependencies for 1.7.3

To print all dependencies and versions: funannotate check --show-versions

You are running Python v 2.7.17. Now checking python packages... All 11 python packages installed

You are running Perl v 5.026001. Now checking perl modules... All 27 Perl modules installed

Checking Environmental Variables... All 6 environmental variables are set

Checking external dependencies... All 36 external dependencies are installed

funannotate check --show-versions

Checking dependencies for 1.7.3

You are running Python v 2.7.17. Now checking python packages... biopython: 1.76 goatools: 0.9.9 matplotlib: 2.2.5 natsort: 6.2.1 numpy: 1.16.6 pandas: 0.22.0 psutil: 5.6.7 requests: 2.22.0 scikit-learn: 0.20.4 scipy: 1.2.3 seaborn: 0.9.1 All 11 python packages installed

You are running Perl v 5.026001. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.42 Clone: 0.39 DBD::SQLite: 1.62 DBD::mysql: 4.050 DBI: 1.643 DB_File: 1.852 Data::Dumper: 2.167 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.49 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.31 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 2.62 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.28 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/mnt/sdb/funannotate/DB_funannotate $PASAHOME=/mnt/sdb/funannotate/PASApipeline $TRINITYHOME=/mnt/sdb/funannotate/trinityrnaseq-v2.9.0 $EVM_HOME=/mnt/sdb/funannotate/EVidenceModeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/mnt/sdb/funannotate/Augustus/config $GENEMARK_PATH=/mnt/sdb/funannotate/gm_et_linux_64/gmes_petap All 6 environmental variables are set

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.9.1 augustus: 3.3.2 bamtools: bamtools 2.5.1 bedtools: bedtools v2.26.0 blat: BLAT v36x2 diamond: 0.9.24 emapper.py: 2.0.1 ete3: 3.1.1 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 gmes_petap.pl: 4.48_3.60_lic hisat2: 2.1.0 hmmscan: HMMER 3.2.1 (June 2018) hmmsearch: HMMER 3.2.1 (June 2018) java: 11.0.6 kallisto: 0.46.0 mafft: v7.453 (2019/Nov/8) makeblastdb: makeblastdb 2.9.0+ minimap2: 2.17-r943-dirty proteinortho: 6.0.10 pslCDnaFilter: no way to determine salmon: salmon 0.14.0 samtools: samtools 1.9-66-gc15e884 signalp: 4.1 snap: 2006-07-28 stringtie: 1.3.6 tRNAscan-SE: 1.3.1 (January 2012) tantan: tantan 13 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.9.0+ trimal: trimAl v1.4.rev22 build[2015-05-21] trimmomatic: 0.39 All 36 external dependencies are installed

Did some research about the error and in /usr/local/lib/python2.7/dist-packages/funannotate/compare.py", line 376 added:

`# get the PFAM descriptions pfamdf2 = pfamdf.transpose().astype(int) PFAM = lib.pfam2dict(os.path.join(FUNDB, 'Pfam-A.clans.tsv')) pfam_desc = [] for i in pfamdf2.index.values: pfam_desc.append(PFAM.get(i)) pfamdf2['descriptions'] = pfam_desc

write to file

pfamdf2.to_csv(os.path.join(args.out, 'pfam', 'pfam.results.csv'), encoding='utf-8') #Modify here
pfamdf2.reset_index(inplace=True)
pfamdf2.rename(columns={'index': 'PFAM'}, inplace=True)
pfamdf2['PFAM'] = '<a target="_blank" href="http://pfam.xfam.org/family/' + \
    pfamdf2['PFAM'].astype(str)+'">'+pfamdf2['PFAM']+'</a>'
# create html output
with io.open(os.path.join(args.out, 'pfam.html'), 'w', encoding="utf-8") as output: #modify here
    pd.set_option('display.max_colwidth', -1)
    output.write(lib.HEADER)
    output.write(lib.PFAM)
    output.write(pfamdf2.to_html(
        index=False, escape=False, classes='table table-hover'))
    output.write(lib.FOOTER)`

then more ahead the following error raise up: Compiling all annotations for each genome Traceback (most recent call last): File "/usr/local/bin/funannotate", line 660, in main() File "/usr/local/bin/funannotate", line 650, in main mod.main(arguments) File "/usr/local/lib/python2.7/dist-packages/funannotate/compare.py", line 1095, in main pfamDict = lib.dictFlipLookup(pfam, PFAM) File "/usr/local/lib/python2.7/dist-packages/funannotate/library.py", line 7046, in dictFlipLookup outDict[i].append(str(result)) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2032' in position 10: ordinal not in range(128)

Went to /usr/local/lib/python2.7/dist-packages/funannotate/library.py", line 7046 and modify: `def dictFlipLookup(input, lookup): outDict = {} for x in input: for k, v in natsorted(x.iteritems()):

lookup description in another dictionary

        if not lookup.get(k) is None:
            result = k+': '+lookup.get(k)
        else:
            result = k+': No description'
        res = result.encode('utf-8') # Modify
        for i in v:
            if i in outDict:
                outDict[i].append(res) # Modify
               # outDict[i].append(str(result)) # Modify
            else:
                outDict[i] = [res]
               # outDict[i] = [str(result)]
return outDict`

And that fix the issue, first time this happen, dont know why, have some issues with dependencies, maybe something change on them when that was solved. Greetings.

atiweb commented 4 years ago

Now reviewing the other issues, I found that https://github.com/nextgenusfs/funannotate/issues/361 is the same problem here. A encoding problem it appear to be. In the header of the HTML for report, you define UTF-8 as enconding, so is the right encoder to all the data. It appear to happen due to invalid characters on PFAM and other data.

nextgenusfs commented 4 years ago

Thanks @atiweb -- just pushed these changes. Hopefully that fixes it. I guess it must be a particular PFAM domain causing an issue as I've never seen it before either.

atiweb commented 4 years ago

Tested the changes, here are the results:

funannotate compare --input /Amauroascus_mutatus/UAMH_3576/fun_out/annotate_results /Amauroascus_niger/UAMH_3544/fun_out/annotate_results /Byssoonygena_ceratinophila/UAMH_5669/fun_out/annotate_results /Chrysosporium_queenslandicum/CBS_280_77/fun_out/annotate_results /Emergomyces_orientalis/5z489/fun_out/annotate_results /Ophidiomyces_ophiodiicola/MYCO-ARIZ_AN0400001/fun_out/annotate_results --out /funannotate_compare_less --cpus 20 --run_dnds estimate

[08:36 AM]: OS: linux2, 24 cores, ~ 132 GB RAM. Python: 2.7.17 [08:36 AM]: Running 1.7.4 [08:36 AM]: Now parsing 6 genomes [08:36 AM]: working on Amauroascus_mutatus_UAMH_3576 UAMH_3576 [08:37 AM]: working on Amauroascus_niger_UAMH_3544 UAMH_3544 [08:38 AM]: working on Byssoonygena_ceratinophila_UAMH_5669 UAMH_5669 [08:38 AM]: working on Chrysosporium_queenslandicum_CBS_280_77 CBS_280_77 [08:39 AM]: working on Emergomyces_orientalis_5z489 5z489 [08:40 AM]: working on Ophidiomyces_ophiodiicola_MYCO-ARIZ_AN0400001 MYCO-ARIZ_AN0400001 [08:40 AM]: Summarizing secondary metabolism gene clusters [08:40 AM]: Summarizing PFAM domain results [08:40 AM]: Summarizing InterProScan results [08:40 AM]: Loading InterPro descriptions [08:40 AM]: Summarizing MEROPS protease results [08:40 AM]: found 41/125 MEROPS familes with stdev >= 1.000000 [08:41 AM]: Summarizing CAZyme results [08:41 AM]: found 32/169 CAZy familes with stdev >= 1.000000 [08:41 AM]: No COG annotations found [08:41 AM]: Summarizing secreted protein results [08:41 AM]: Summarizing fungal transcription factors [08:41 AM]: Running GO enrichment for each genome [11:26 AM]: Running orthologous clustering tool, ProteinOrtho. This may take awhile... [11:26 AM]: Calculating dN/dS ratios for each ortholog group, 12792 orthologous groups [11:33 AM]: Compiling all annotations for each genome [11:34 AM]: Compressing results to output file: mntsdc1bichos_marcusfunannotate_compare_less.tar.gz [11:38 AM]: Funannotate compare completed successfully!

Full results: https://nextcloud.atiweb.site/s/ey79TsXezQxtBEP