morgannprice / PaperBLAST

PaperBLAST: find papers about a protein or its homologs
http://papers.genomics.lbl.gov
GNU General Public License v3.0
35 stars 6 forks source link

Issues in running setupGaps.pl Execution #12

Closed SnowPeak7 closed 10 months ago

SnowPeak7 commented 10 months ago

Description

While executing the setupGaps.pl script, I encountered several warnings about uninitialized values, as well as multiple notices of entries being skipped due to them being fragments or having cautions. The script seems to be functioning, but these warnings and notices are concerning.

Environment

Steps to Reproduce

  1. Run the script : bin/setupGaps.pl -ind ind -set setName -data data -sprot sprot.curated_parsed

I expected the script to run without warnings about uninitialized values, and a clear understanding of why certain entries are skipped.

The script outputs warnings about uninitialized values in SWISS/CCinteraction.pm and skips several entries due to being fragments or having cautions.

Output Log 1

Use of uninitialized value $arg{"xeno"} in substitution (s///) at /data1/bioinfo_software/perl_module/lib/perl5/x86_64-linux-gnu-thread-multi/SWISS/CCinteraction.pm line 43, chunk 521084. Q15JG4 has a caution, skipped O68953 is a fragment, skipped O68953 is a fragment, skipped O68954 is a fragment, skipped O68954 is a fragment, skipped O68949 is a fragment, skipped O68952 is a fragment, skipped P9WF82 has a caution, skipped P9WF83 has a caution, skipped P9WF60 has a caution, skipped P9WF61 has a caution, skipped

The above is part of the error, it's all similar, so only part of it is shown.

Output Log 2

/data1/bioinfo_software/PaperBLAST/bin/runPfamHits.pl: row 19: /tmp/pfam.hits.464571/PF00569.21.domtbl: There is no such file or directory.

This is the error at the end of the program.

The program was able to end normally, but I'm worried that these errors will affect later steps, so I'm here for help.

morgannprice commented 10 months ago

The "skipped" statements are part of the normal log. The warnings about uninitialized values from SWISS/CCinteraction.pm are also normal. (They are from the library for parsing SwissProt entries, not from my code.)

I am surprised by the error from runPfamHits.pl -- I haven't seen this before. It implies that tmp/path.setName/pfam.hits.tab is incomplete or even empty. This could be an issue later if you plan to use clustering to improve step definitions but won't affect GapMind itself.

Is Pfam-A.hmm in the hmm/ subdirectory? Are hmmfetch and hmmsearch in the bin/ subdirectory? Do you think /tmp could have run out of space? This script needs space to store the extracted hmms (1.6 GB) as well as space for all the hits (maybe 50 MB). If you don't see anything, run the steps in runPfamHits.pl (which is a misnomer, it is a shell script) individually and check the error logs in /tmp/pfam.hits.$$/*.log

SnowPeak7 commented 10 months ago

Thank you very much for your advice. I modified the command to the format of the second line, and the error disappeared. This is line 164 in submitter.pl

my $cmd = "($cdCmd $envCommand time $commands[$ncmd]) >& $logstart-$ncmd.log"; my $cmd = "($cdCmd $envCommand time $commands[$ncmd]) > $logstart-$ncmd.log 2>&1"; # This can work