lpantano / seqcluster

small RNA analysis from NGS data
http://seqcluster.readthedocs.io
MIT License
35 stars 17 forks source link

seqcluster executable not found when testing #35

Closed smoe closed 5 years ago

smoe commented 6 years ago

Hello again, I had a closer look at the tests that miss an executable "seqcluster" script. I understand that this is created by setup.py at

      entry_points={
          'console_scripts': ['seqcluster=seqcluster.command_line:main', 'seqcluster_install=seqcluster.install:main'],
      },

The default build system of Debian then places this directly at debian/python3-seqcluster-bin/usr/bin/seqcluster with no other copy or footprint in your build / testing tree. Consequently, this leads to an error like

======================================================================
ERROR: Run miraligner analysis
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_analysis.py", line 107, in test_srnaseq_miraligner
    subprocess.check_call(cl)
  File "/usr/lib/python3.6/subprocess.py", line 286, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python3.6/subprocess.py", line 267, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'seqcluster': 'seqcluster'
-------------------- >> begin captured stdout << ---------------------
seqcluster seqbuster --sps hsa --hairpin ../../data/examples/miraligner/hairpin.fa --mirna ../../data/examples/miraligner/miRNA.str --gtf ../../data/examples/miraligner/hsa.gff3 -o test_out_mirs_fasta --miraligner ../../data/examples/miraligner/sim_isomir.fa with workdir '/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output' executed from '/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output'.

--------------------- >> end captured stdout << ----------------------

I had added the extra information on the cwd to the output. I admit to also be a bit confused about the expectation that the seqcluster executable is in the path. For travis it may just be fine, but normally one would want to test in the tree. Manually, I have done the following as a bit of a workaround:

moeller@steffen-laptop-debian:~/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output$ PYTHONPATH=../.. ./seqcluster seqbuster --sps hsa --hairpin ../../data/examples/miraligner/hairpin.fa --mirna ../../data/examples/miraligner/miRNA.str --gtf ../../data/examples/miraligner/hsa.gff3 -o test_out_mirs_fasta --miraligner ../../data/examples/miraligner/sim_isomir.fa with workdir '/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output' executed from '/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output'.
Probably this will fail, you need bcbio-nextgen for many installation functions.
['seqbuster', '--sps', 'hsa', '--hairpin', '../../data/examples/miraligner/hairpin.fa', '--mirna', '../../data/examples/miraligner/miRNA.str', '--gtf', '../../data/examples/miraligner/hsa.gff3', '-o', 'test_out_mirs_fasta', '--miraligner', '../../data/examples/miraligner/sim_isomir.fa', 'with', 'workdir', '/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output', 'executed', 'from', '/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output.']
INFO Run seqbuster
INFO Reading ../../data/examples/miraligner/sim_isomir.fa
sh: 1: miraligner: not found
INFO Running miraligner with ../../data/examples/miraligner/sim_isomir_unique.fa
INFO Hits: 21
INFO Valid hits (+/-3 reference miRNA): 21
Traceback (most recent call last):
  File "./seqcluster", line 11, in <module>
    load_entry_point('seqcluster==1.2.4a8', 'console_scripts', 'seqcluster')()
  File "/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/seqcluster/command_line.py", line 40, in main
    miraligner(kwargs["args"])
  File "/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/seqcluster/seqbuster/__init__.py", line 515, in miraligner
    _mirtop(out_files, args.hairpin, args.gtf, args.sps, args.out)
  File "/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/seqcluster/seqbuster/__init__.py", line 393, in _mirtop
    reader(args)
  File "/usr/lib/python3/dist-packages/mirtop/gff/__init__.py", line 50, in reader
    out_dts[fn] = body.create(ann, database, sample, args)
  File "/usr/lib/python3/dist-packages/mirtop/gff/body.py", line 49, in create
    for r, read in reads.iteritems():
AttributeError: 'collections.defaultdict' object has no attribute 'iteritems'

There is another Python3 issue with iteritems (https://stackoverflow.com/questions/13998492/iteritems-in-python) but that aside, without the explicit path to the seqcluster library and the PYTHONPATH setting, this would not have executed in the first place.

Is there a way to avoid running those scripts? Maybe by invoking the respective internal funtions directly? And, bcbio is a reverse dependency to seqcluster in my point of view. Is a build (i.e. test-) dependency on it avoidable?

Cheers,

Steffen

smoe commented 5 years ago

Just successfully worked with a freshly cloned dev branch and did not encounter above issues.