Closed atiweb closed 4 years ago
Now reviewing the other issues, I found that https://github.com/nextgenusfs/funannotate/issues/361 is the same problem here. A encoding problem it appear to be. In the header of the HTML for report, you define UTF-8 as enconding, so is the right encoder to all the data. It appear to happen due to invalid characters on PFAM and other data.
Thanks @atiweb -- just pushed these changes. Hopefully that fixes it. I guess it must be a particular PFAM domain causing an issue as I've never seen it before either.
Tested the changes, here are the results:
[08:36 AM]: OS: linux2, 24 cores, ~ 132 GB RAM. Python: 2.7.17 [08:36 AM]: Running 1.7.4 [08:36 AM]: Now parsing 6 genomes [08:36 AM]: working on Amauroascus_mutatus_UAMH_3576 UAMH_3576 [08:37 AM]: working on Amauroascus_niger_UAMH_3544 UAMH_3544 [08:38 AM]: working on Byssoonygena_ceratinophila_UAMH_5669 UAMH_5669 [08:38 AM]: working on Chrysosporium_queenslandicum_CBS_280_77 CBS_280_77 [08:39 AM]: working on Emergomyces_orientalis_5z489 5z489 [08:40 AM]: working on Ophidiomyces_ophiodiicola_MYCO-ARIZ_AN0400001 MYCO-ARIZ_AN0400001 [08:40 AM]: Summarizing secondary metabolism gene clusters [08:40 AM]: Summarizing PFAM domain results [08:40 AM]: Summarizing InterProScan results [08:40 AM]: Loading InterPro descriptions [08:40 AM]: Summarizing MEROPS protease results [08:40 AM]: found 41/125 MEROPS familes with stdev >= 1.000000 [08:41 AM]: Summarizing CAZyme results [08:41 AM]: found 32/169 CAZy familes with stdev >= 1.000000 [08:41 AM]: No COG annotations found [08:41 AM]: Summarizing secreted protein results [08:41 AM]: Summarizing fungal transcription factors [08:41 AM]: Running GO enrichment for each genome [11:26 AM]: Running orthologous clustering tool, ProteinOrtho. This may take awhile... [11:26 AM]: Calculating dN/dS ratios for each ortholog group, 12792 orthologous groups [11:33 AM]: Compiling all annotations for each genome [11:34 AM]: Compressing results to output file: mntsdc1bichos_marcusfunannotate_compare_less.tar.gz [11:38 AM]: Funannotate compare completed successfully!
Full results: https://nextcloud.atiweb.site/s/ey79TsXezQxtBEP
Are you using the latest release? Yes,
Describe the bug On funannotate compare OS: linux2, 24 cores, ~ 132 GB RAM. Python: 2.7.17 [12:17 AM]: Running 1.7.3 [12:17 AM]: Now parsing 6 genomes [12:17 AM]: working on Amauroascus_mutatus_UAMH_3576 UAMH_3576 [12:18 AM]: working on Amauroascus_niger_UAMH_3544 UAMH_3544 [12:18 AM]: working on Byssoonygena_ceratinophila_UAMH_5669 UAMH_5669 [12:19 AM]: working on Chrysosporium_queenslandicum_CBS_280_77 CBS_280_77 [12:19 AM]: working on Emergomyces_orientalis_5z489 5z489 [12:20 AM]: working on Ophidiomyces_ophiodiicola_MYCO-ARIZ_AN0400001 MYCO-ARIZ_AN0400001 [12:20 AM]: Summarizing secondary metabolism gene clusters [12:20 AM]: Summarizing PFAM domain results Traceback (most recent call last): File "/usr/local/bin/funannotate", line 660, in
main()
File "/usr/local/bin/funannotate", line 650, in main
mod.main(arguments)
File "/usr/local/lib/python2.7/dist-packages/funannotate/compare.py", line 376, in main
pfamdf2.to_csv(os.path.join(args.out, 'pfam', 'pfam.results.csv'))
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3020, in to_csv
formatter.save()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/formats/csvs.py", line 172, in save
self._save()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/formats/csvs.py", line 288, in _save
self._save_chunk(start_i, end_i)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/formats/csvs.py", line 315, in _save_chunk
self.cols, self.writer)
File "pandas/_libs/writers.pyx", line 55, in pandas._libs.writers.write_csv_rows
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2032' in position 1: ordinal not in range(128)
What command did you issue? funannotate compare --input Amauroascus_mutatus/UAMH_3576/fun_out/annotate_results Amauroascus_niger/UAMH_3544/fun_out/annotate_results /Byssoonygena_ceratinophila/UAMH_5669/fun_out/annotate_results Chrysosporium_queenslandicum/CBS_280_77/fun_out/annotate_results Emergomyces_orientalis/5z489/fun_out/annotate_results Ophidiomyces_ophiodiicola/MYCO-ARIZ_AN0400001/fun_out/annotate_results --out /funannotate_compare_less --cpus 20 --run_dnds estimate
OS/Install Information
Checking dependencies for 1.7.3
To print all dependencies and versions: funannotate check --show-versions
You are running Python v 2.7.17. Now checking python packages... All 11 python packages installed
You are running Perl v 5.026001. Now checking perl modules... All 27 Perl modules installed
Checking Environmental Variables... All 6 environmental variables are set
Checking external dependencies... All 36 external dependencies are installed
funannotate check --show-versions
Checking dependencies for 1.7.3
You are running Python v 2.7.17. Now checking python packages... biopython: 1.76 goatools: 0.9.9 matplotlib: 2.2.5 natsort: 6.2.1 numpy: 1.16.6 pandas: 0.22.0 psutil: 5.6.7 requests: 2.22.0 scikit-learn: 0.20.4 scipy: 1.2.3 seaborn: 0.9.1 All 11 python packages installed
You are running Perl v 5.026001. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.42 Clone: 0.39 DBD::SQLite: 1.62 DBD::mysql: 4.050 DBI: 1.643 DB_File: 1.852 Data::Dumper: 2.167 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.49 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.31 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 2.62 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.28 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed
Checking Environmental Variables... $FUNANNOTATE_DB=/mnt/sdb/funannotate/DB_funannotate $PASAHOME=/mnt/sdb/funannotate/PASApipeline $TRINITYHOME=/mnt/sdb/funannotate/trinityrnaseq-v2.9.0 $EVM_HOME=/mnt/sdb/funannotate/EVidenceModeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/mnt/sdb/funannotate/Augustus/config $GENEMARK_PATH=/mnt/sdb/funannotate/gm_et_linux_64/gmes_petap All 6 environmental variables are set
Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.9.1 augustus: 3.3.2 bamtools: bamtools 2.5.1 bedtools: bedtools v2.26.0 blat: BLAT v36x2 diamond: 0.9.24 emapper.py: 2.0.1 ete3: 3.1.1 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 gmes_petap.pl: 4.48_3.60_lic hisat2: 2.1.0 hmmscan: HMMER 3.2.1 (June 2018) hmmsearch: HMMER 3.2.1 (June 2018) java: 11.0.6 kallisto: 0.46.0 mafft: v7.453 (2019/Nov/8) makeblastdb: makeblastdb 2.9.0+ minimap2: 2.17-r943-dirty proteinortho: 6.0.10 pslCDnaFilter: no way to determine salmon: salmon 0.14.0 samtools: samtools 1.9-66-gc15e884 signalp: 4.1 snap: 2006-07-28 stringtie: 1.3.6 tRNAscan-SE: 1.3.1 (January 2012) tantan: tantan 13 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.9.0+ trimal: trimAl v1.4.rev22 build[2015-05-21] trimmomatic: 0.39 All 36 external dependencies are installed
Did some research about the error and in /usr/local/lib/python2.7/dist-packages/funannotate/compare.py", line 376 added:
`# get the PFAM descriptions pfamdf2 = pfamdf.transpose().astype(int) PFAM = lib.pfam2dict(os.path.join(FUNDB, 'Pfam-A.clans.tsv')) pfam_desc = [] for i in pfamdf2.index.values: pfam_desc.append(PFAM.get(i)) pfamdf2['descriptions'] = pfam_desc
write to file
then more ahead the following error raise up: Compiling all annotations for each genome Traceback (most recent call last): File "/usr/local/bin/funannotate", line 660, in
main()
File "/usr/local/bin/funannotate", line 650, in main
mod.main(arguments)
File "/usr/local/lib/python2.7/dist-packages/funannotate/compare.py", line 1095, in main
pfamDict = lib.dictFlipLookup(pfam, PFAM)
File "/usr/local/lib/python2.7/dist-packages/funannotate/library.py", line 7046, in dictFlipLookup
outDict[i].append(str(result))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2032' in position 10: ordinal not in range(128)
Went to /usr/local/lib/python2.7/dist-packages/funannotate/library.py", line 7046 and modify: `def dictFlipLookup(input, lookup): outDict = {} for x in input: for k, v in natsorted(x.iteritems()):
lookup description in another dictionary
And that fix the issue, first time this happen, dont know why, have some issues with dependencies, maybe something change on them when that was solved. Greetings.