metageni / SUPER-FOCUS

A tool for agile functional analysis of shotgun metagenomic data
GNU General Public License v3.0
21 stars 12 forks source link

Joining files makeblastdb error 803.7 #13

Closed bashirhamidi closed 5 years ago

bashirhamidi commented 5 years ago

Thanks for your prompt response on the conda superfocus_downloadDB.py issue. I've overcome that but as part of the unpacking and joining I'm running into this issue. Any insight would be appreciated.

...
...
...
inflating: clusters/100_clusters/216_cluster.faa  
inflating: clusters/100_clusters/70_cluster.faa  
inflating: clusters/100_clusters/1198_cluster.faa  
inflating: clusters/100_clusters/71_cluster.faa  

Joining files

Formatting DB
blast: DB_100

Building a new DB, current time: 07/18/2018 11:07:42
New DB name:   /home/bah29/packages/SUPER-FOCUS-0.30/db/static/blast/100.db
New DB title:  100.db
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Error: (803.7) [makeblastdb] Blast-def-line-set.E.title
Bad char [0x96] in string at byte 151
fig|419947.9.peg.1104__1009__Mycobacterial_MmpL5_membrane_protein_cluster__Rv0678__MarR_family_transcriptional_regulator_associated_with_MmpL5?MmpS5_efflux_system
Error: (803.7) [makeblastdb] Blast-def-line-set.E.title
Bad char [0x96] in string at byte 151
fig|419947.9.peg.1104__1009__Mycobacterial_MmpL5_membrane_protein_cluster__Rv0678__MarR_family_transcriptional_regulator_associated_with_MmpL5?MmpS5_efflux_system

Adding sequences from FASTA; added 7143907 sequences in 333.523 seconds.
blast: DB_98

Building a new DB, current time: 07/18/2018 11:13:15
New DB name:   /home/bah29/packages/SUPER-FOCUS-0.30/db/static/blast/98.db
New DB title:  98.db
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 5234971 sequences in 249.484 seconds.
blast: DB_95

Building a new DB, current time: 07/18/2018 11:17:25
New DB name:   /home/bah29/packages/SUPER-FOCUS-0.30/db/static/blast/95.db
New DB title:  95.db
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 4520139 sequences in 214.329 seconds.
blast: DB_90

Building a new DB, current time: 07/18/2018 11:21:00
New DB name:   /home/bah29/packages/SUPER-FOCUS-0.30/db/static/blast/90.db
New DB title:  90.db
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 3865833 sequences in 184.21 seconds.
Done! Now you can run superfocus.py
metageni commented 5 years ago

@Mark1plus1 hey mark. Sorry. I'm being super busy, so I will need sometime before answering you.

The quick solution to all that is to not use bioconda version for now. Instead, download version 0.28 (https://github.com/metageni/SUPER-FOCUS/releases/tag/0.28), run the download database script that is there, and run super-focus from the folder. Sorry about that.

bashirhamidi commented 5 years ago

@metageni I can confirm that I was able to successfully get SUPER-FOCUS 0.30 from github to run and would leave it up to you to resolve or close this issue as you see fit.

metageni commented 5 years ago

@Mark1plus1 I have refactored SUPER-FOCUS, and it should be working fine now. I still need to update it on bioconda, but it should be good to go just by running the superfocus.py or installing it using setuptools.

Let me know in case you have any feedback.