medema-group / BiG-SCAPE

Similarity networks of biosynthetic gene clusters
GNU Affero General Public License v3.0
69 stars 26 forks source link

Error During Proceding Input Files (importing Genebank Files) #173

Closed Sarfraz2646 closed 2 weeks ago

Sarfraz2646 commented 2 months ago

I am encountering difficulties when attempting to input GeneBank files into the BIG-SCAPE ,please guide me error

jorgecnavarrom commented 2 months ago

Hi. It looks as if you re-run BiG-SCAPE on the same dataset, but it didn't found any domains in the protein sequences from your antiSMASH results. Perhaps you forgot to use --pfam_dir the first time?

I guess I'd try running it again with --pfam_dir and pointing to a different output folder

kellystyles commented 2 months ago

Are your genbank files standard antismash output files ending with '.gbk'? I had this issue today and it turns out the genbank files must have the suffix '.gbk' not '.gb'

Sarfraz2646 commented 2 months ago

Hi. It looks as if you re-run BiG-SCAPE on the same dataset, but it didn't found any domains in the protein sequences from your antiSMASH results. Perhaps you forgot to use --pfam_dir the first time?

I guess I'd try running it again with --pfam_dir and pointing to a different output folder

-> I did try this too yet same issues

Sarfraz2646 commented 2 months ago

Are your genbank files standard antismash output files ending with '.gbk'? I had this issue today and it turns out the genbank files must have the suffix '.gbk' not '.gb'

yes I crossed check they are in proper extension of .gbk

jorgecnavarrom commented 2 months ago

Is it possible for you to attach one or two files for us to check?

Sarfraz2646 commented 2 months ago

2 samples .gbk files.zip https://drive.google.com/file/d/1XlNQyvOc2cCBYI0h7JNFyaK4A6p5NXuT/view?usp=drive_web in this Zip file, 2 samples with diff folder ,containing the .gbk files, i am facing the issues during running ,so please run and tell me Thanks

On Sun, Aug 25, 2024 at 6:28 PM Jorge Navarro @.***> wrote:

Is it possible for you to attach one or two files for us to check?

— Reply to this email directly, view it on GitHub https://github.com/medema-group/BiG-SCAPE/issues/173#issuecomment-2308840563, or unsubscribe https://github.com/notifications/unsubscribe-auth/BKU5O5NP625WNWK4M22NVC3ZTHLZFAVCNFSM6AAAAABMZY5M7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYHA2DANJWGM . You are receiving this because you authored the thread.Message ID: @.***>

Sarfraz2646 commented 2 months ago

I am waiting Your kindness reply.

On Sun, Aug 25, 2024 at 6:28 PM Jorge Navarro @.***> wrote:

Is it possible for you to attach one or two files for us to check?

— Reply to this email directly, view it on GitHub https://github.com/medema-group/BiG-SCAPE/issues/173#issuecomment-2308840563, or unsubscribe https://github.com/notifications/unsubscribe-auth/BKU5O5NP625WNWK4M22NVC3ZTHLZFAVCNFSM6AAAAABMZY5M7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYHA2DANJWGM . You are receiving this because you authored the thread.Message ID: @.***>

adraismawur commented 1 month ago

Hi,

I have run the two samples you have provided and they seem to complete without issues.

Is it at all possible for you to run BiG-SCAPE on a unix machine? Perhaps uing windows subsystem for linux (WSL)?

jorgecnavarrom commented 1 month ago

Hi. Same here, the run finished without issues:

python ~/Code/BiG-SCAPE/bigscape.py --pfam_dir ~/Databases/Pfam/37 -i 2_antismash_results -o test --mix --no_classify --clans-off

   - - Processing input files - -
 Including files with one or more of the following strings in their filename: 'cluster', 'region'
 Skipping files with one or more of the following strings in their filename: 'final'

Importing GenBank files

 Starting with 37 files
 Files that had its sequence extracted: 37

Creating output directories

Trying threading on 10 cores

Predicting domains using hmmscan
 Predicting domains for 37 fasta files
 Finished generating domtable files.

Parsing hmmscan domtable files
 Processing 37 domtable files
 New domain sequences to be added; cleaning domains folder
 Finished generating pfs and pfd files.

Processing domains sequence files
 Adding sequences to corresponding domains file
 Reading the ordered list of domains from the pfs files
 Creating arrower-like figures for each BGC
  Parsing hmm file for domain information
    Done
  Domains colors file was not found. An empty file will be created
  Reading BGC information and writing SVG
 Finished creating figures

   - - Calculating distance matrix - -
Performing multiple alignment of domain sequences

 Using hmmalign
launch_hmmalign took 9.745 seconds
 Trying to read domain alignments (*.algn files)

Generating distance network files with ALL available input files
   Writing the complete Annotations file for the complete set

 Mixing all BGC classes

  Mix (37 BGCs)
  Calculating all pairwise distances
generate_network took 0.065 seconds
  Writing output files
  Calling Gene Cluster Families
  Cutoff: 0.3

    Main function took 71.871 s

Have you tried re-running BiG-SCAPE but using a different output folder? Perhaps something from a previous run is interfering with the analysis

github-actions[bot] commented 1 month ago

This issue has not seen activity for 14 days and has been marked as stale. Please comment with additional information if this issue is still relevant.

github-actions[bot] commented 2 weeks ago

This issue has been stale for 14 days and has been closed. Please feel free to re-open this issue if necessary.