lpantano / seqcluster

small RNA analysis from NGS data
http://seqcluster.readthedocs.io
MIT License
35 stars 17 forks source link

More detail in log for make_profile #46

Closed smoe closed 5 years ago

smoe commented 5 years ago

Somehow I cannot create that profile. A directory "1" is created but that remains empty. A ptrace looks like polling - 100% CPU activity with perpetual reads. Anyway - something like this helps with getting a bit more of an insight on where things happen. Any idea where to look?

lpantano commented 5 years ago

Maybe is a messy sequence, that part map the sequences to the precursor. If it is low complex I can see how things can go crazy. I guess you can turn on -d and then send me the log file. It would be huge, but maybe more information.

smoe commented 5 years ago

I did as you suggested but there was nothing. First copied from STDOUT - had to cancel after a long wait:

07/11/2019 07:10:01 PM INFO: create profile (test_out_report/profiles)
^CTraceback (most recent call last):                                                                                                                                                                                      |
  File "/home/moeller/git/med-team/python-seqcluster/debian/python3-seqcluster/usr/bin/seqcluster", line 11, in <module>
    load_entry_point('seqcluster==1.2.5', 'console_scripts', 'seqcluster')()
  File "/home/moeller/git/med-team/python-seqcluster/seqcluster/command_line.py", line 31, in main
make[1]: *** [debian/rules:28: override_dh_auto_test] Interrupt
    report(kwargs["args"])
  File "/home/moeller/git/med-team/python-seqcluster/seqcluster/create_report.py", line 31, in report
    data = make_profile(data, profilesDir, args)
  File "/home/moeller/git/med-team/python-seqcluster/seqcluster/libs/report.py", line 61, in make_profile
    data[0][c]['precursor'].update(run_rnafold(data[0][c]['precursor']['seq']))
  File "/home/moeller/git/med-team/python-seqcluster/seqcluster/function/rnafold.py", line 14, in run_rnafold
make: *** [debian/rules:9: binary] Error 1
moeller@steffen-laptop-debian:~/git/med-team/python-seqcluster$     for line in iter(process.stdout.readline, ''):
KeyboardInterrupt
semop(1): encountered an error: Invalid argument
----------------------------------------------------------------------
Ran 1 test in 144.696s

OK
Traceback (most recent call last):
  File "/usr/bin/pybuild", line 547, in <module>
semop(1): encountered an error: Invalid argument

This "semop" seems to indicate something.

And here the logfiles:

moeller@steffen-laptop-debian:~/.../python-seqcluster/.pybuild/cpython3_3.7_seqcluster/build/test/test_automated_output/test_out_report/log$ cat *
INFO-seqcluster.libs.logger(30): Run report
INFO-report(26): reading sequeces
INFO-report(30): create profile (test_out_report/profiles)
INFO - Run report
INFO - reading sequeces
INFO - create profile (test_out_report/profiles)

So, somehow this is all hanging. What comes to mind is that the whole build script is executed within "fakeroot". This may irritate file permissions: $ whoami moeller $ fakeroot whoami root

Is there something for me to run that helps with a diagnosis?

lpantano commented 5 years ago

I am blanking… This is the code running:

https://github.com/lpantano/seqcluster/blob/master/seqcluster/function/rnafold.py

The only think could be happening is the rnafold is behaving different there and not returning anything.

Normally I introduce a pdb.set_trace() to get control in a temporal python interface, and see if I go line by line and know where is hanging exactly inside that function.

I don’t know how the user will affect that.

Sorry not to have more ideas.

smoe commented 5 years ago

Ha! That was it! RNAfold was not installed. Thank you tons! I admit to find it a bit surprising that there was no error message pointing me to that omission. And I am not ultimately sure about how test if RNAfold is in the $PATH prior to executing it. Stackoverflow presented me https://stackoverflow.com/questions/377017/test-if-executable-exists-in-python - would you feel much like adding a test to avoid such a hang? Maybe also for other binaries the report generation depends on? I digress, but with bcbio in mind or other larger workflows wrapping you - I think it would be an interesting feat to have a "--sanity-check" option for all tools to execute prior to starting any execution. Next time you run into Brad (I am sadly not at the ISMB this year), please kindly pitch this idea to him.

lpantano commented 5 years ago

Thank you, is a very good idea. I’ll try to add it, or I can accept a PR as well :)