mlibbrecht / submodular_sequence_repset

4 stars 4 forks source link

can't work with DNA sequences #5

Open jun3234 opened 4 years ago

jun3234 commented 4 years ago

I can run scripts with default protein sequences. But, when I ran scripts with dna sequences, likes:

python2 repset.py --outdir test --seqs test.fa

And, after database was maked, it occurs error.

2020-08-17 18:46:36,841 INFO:makeblastdb -in test.fa -input_type fasta -out test/db -dbtype prot

Building a new DB, current time: 08/17/2020 18:46:36
New DB name:   /home/smrtanalysis/tools2/submodular_sequence_repset/test/db
New DB title:  test.fa
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 88170 sequences in 3.99215 seconds.

2020-08-17 18:46:40,911 INFO:psiblast -query test.fa -db test/db -num_iterations 6 -outfmt 6 qseqid sseqid pident length mismatch evalue bitscore -seg yes -out test/psiblast_result.tab
BLAST engine error: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options
Traceback (most recent call last):
  File "repset.py", line 1585, in <module>
    db = run_psiblast(workdir, args.seqs)
  File "repset.py", line 98, in run_psiblast
    subprocess.check_call(cmd)
  File "/home/smrtanalysis/python2/lib/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['psiblast', '-query', path('test.fa'), '-db', path('test/db'), '-num_iterations', '6', '-outfmt', '6 qseqid sseqid pident length mismatch evalue bitscore', '-seg', 'yes', '-out', path('test/psiblast_result.tab')]' returned non-zero exit status 3

When I mod the scripts, chane prot to nucl in line 19, But, it also complains.

2020-08-17 19:00:09,751 INFO:makeblastdb -in test.fa -input_type fasta -out test/db -dbtype nucl

Building a new DB, current time: 08/17/2020 19:00:09
New DB name:   /home/smrtanalysis/tools2/submodular_sequence_repset/test/db
New DB title:  test.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 88170 sequences in 3.5302 seconds.

2020-08-17 19:00:13,328 INFO:psiblast -query test.fa -db test/db -num_iterations 6 -outfmt 6 qseqid sseqid pident length mismatch evalue bitscore -seg yes -out test/psiblast_result.tab
BLAST Database error: No alias or index file found for protein database [test/db] in search path [/home/smrtanalysis/tools2/submodular_sequence_repset::]
Traceback (most recent call last):
  File "repset.py", line 1585, in <module>
    db = run_psiblast(workdir, args.seqs)
  File "repset.py", line 98, in run_psiblast
    subprocess.check_call(cmd)
  File "/home/smrtanalysis/python2/lib/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['psiblast', '-query', path('test.fa'), '-db', path('test/db'), '-num_iterations', '6', '-outfmt', '6 qseqid sseqid pident length mismatch evalue bitscore', '-seg', 'yes', '-out', path('test/psiblast_result.tab')]' returned non-zero exit status 2

And, how can I run this script with dna sequences?

Tanks~ Jun

mlibbrecht commented 4 years ago

That's strange, I haven't encountered that problem before. Does it work if you run the makeblastdb and psiblast commands on the command line yourself? Otherwise, you might try looking on the psiblast troubleshooting pages.

jun3234 commented 4 years ago

There those files are in current dircetory.

[smrtanalysis@localhost submodular_sequence_repset]$ ll -htr
total 154M
-rw-r--r--. 1 smrtanalysis xialab 1.0K Aug  7 10:36 Readme.md
-rw-r--r--. 1 smrtanalysis xialab  78M Aug 17 18:31 final.unaligned.fa
drwxr-xr-x. 2 smrtanalysis xialab  189 Aug 17 18:31 out
-rw-r--r--. 1 smrtanalysis xialab  76M Aug 17 18:40 test.fa
-rw-r--r--. 1 smrtanalysis xialab  63K Aug 17 18:59 repset.py
drwxr-xr-x. 2 smrtanalysis xialab  189 Aug 17 19:00 test

And, I checked commands in stdout.txt.

[smrtanalysis@localhost submodular_sequence_repset]$ cat ./test/stdout.txt
2020-08-17 19:00:09,751 INFO - makeblastdb -in test.fa -input_type fasta -out test/db -dbtype nucl
2020-08-17 19:00:13,328 INFO - psiblast -query test.fa -db test/db -num_iterations 6 -outfmt 6 qseqid sseqid pident length mismatch evalue bitscore -seg yes -out test/psiblast_result.tab

Then, I ran this commands in stdout.txt manually.

[smrtanalysis@localhost submodular_sequence_repset]$ makeblastdb -in test.fa -input_type fasta -out test_man/db -dbtype nucl

Building a new DB, current time: 08/17/2020 23:10:50
New DB name:   /home/smrtanalysis/tools2/submodular_sequence_repset/test_man/db
New DB title:  test.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 88170 sequences in 4.24149 seconds.

[smrtanalysis@localhost submodular_sequence_repset]$ psiblast -query test.fa -db test_man/db -num_iterations 6 -outfmt '6 qseqid sseqid pident length mismatch evalue bitscore' -seg yes -out test_man/psiblast_result.tab
BLAST Database error: No alias or index file found for protein database [test_man/db] in search path [/home/smrtanalysis/tools2/submodular_sequence_repset::]

But, Error occurs again. Do I need build protein database? While I just have dna sequences.

mlibbrecht commented 4 years ago

Everything in repset should work with nucleotide sequences. I'm not sure why psiblast isn't working, and I don't have any experience with that bug, sorry -- you should try asking on the psiblast troubleshooting.