Closed jyesselm closed 3 years ago
Hi @jyesselm
I believe the main reason we have not setup sub commands is because that would require additional overhead to run each operation and many of these operations are executed in high throughput pipelines. That is also the reason why we don't use argparse
.
We can do sub commanding without using argparse
and just using built-ins
. It would require the time to execute an if
statement, which is in the nanosecond range. We can make a vote among developers and create a new branch to explore this idea.
What do you think? @JoaoRodrigues @amjjbonvin @mtrellet @brianjimenez
Wouldn’t this introduce an overhead when we are building pipelines? (and also longer commands at the end)
Have you looked into doing this with click. It doesnt seem to bad. If you wouldnt mind at some point I would like to demo it with the most used commands. I can create a fork and show you. I feel the overhead will be minimal but happy to test this.
Hi Joseph
PDB-tools are also used in our web portal. I don’t see making changes that would break a working machinery, to solve conflicts on your side :-) But you can of course try to convince us.
Those scripts have a long history (were initially good old fortran code) and have been doing the job for decades.
Cheers Alexandre
Sure completely understood! I thought I would throw it out there. Is there any issue of me making a fork for my own purposes? With credit to you fine folks.
Best, Joe
Hi Joe
Our license allows for that. So go for it! And may-be we will adopt your changes in some future. But the real developers / coders should give their opinion here :-)
Cheers Alexandre
PS: And a demo at some point would of course be nice.
Hi Joe,
Thank you so much for raising this issue and for proposing the subcommands.
When I first wrote pdbtools, I had that design in mind but abandoned it because of performance/dependencies. One of the main features of pdbtools is that they have zero dependencies outside the standard library, so no click allowed :) We did look at argparse but we were losing a lot of execution time. Since in-house we used the scripts to process hundreds/thousands of structures, a 2-5x performance drop was very noticeable!
If you find a way to reproduce subcommands using only sys.argv, we'd be happy to consider the proposal! My guess is that you'd have an entry script and then dispatch the input/options to the functions we need from each script. Not very trivial ..
Cheers
João
A quarta, 5/05/2021, 13:33, Alexandre Bonvin @.***> escreveu:
PS: And a demo at some point would of course be nice.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haddocking/pdb-tools/issues/102#issuecomment-832988047, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZ6EDSPOEGSEMU4RGXOGLTMGTQVANCNFSM44DZLXNQ .
I see this makes a lot of sense. I do not have this use case. But this I can understand that if you need to run the script thousands of times that would add up. I feel this could be solved by having a batch mode but that breaks your cmd style. Well if you can leave this request open I will try and play around with it and link to my solution if I find one. Thanks again for answering my questions.
Sure, happy to review your implmentation!
Batch mode is hard with pipes :)
A quarta, 5/05/2021, 17:33, Joseph Yesselman @.***> escreveu:
I see this makes a lot of sense. I do not have this use case. But this I can understand that if you need to run the script thousands of times that would add up. I feel this could be solved by having a batch mode but that breaks your cmd style. Well if you can leave this request open I will try and play around with it and link to my solution if I find one. Thanks again for answering my questions.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haddocking/pdb-tools/issues/102#issuecomment-833140717, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZ6ECZ62EPDNOTJISOQCDTMHPT7ANCNFSM44DZLXNQ .
So here is the demo.
https://github.com/jyesselm/pdb-tools
Seems to work as intended and no large speed hit. ( do not think the speeds are really worth anything at this amount time I reran it a few times and got the same speed as the original commands )
pdb_selchain -A,D 1brs.pdb | pdb_delhetatm | pdb_tidy > 1brs_AD_noHET.pdb
can be replaced with
pdbtools chainsel -A,D 1brs.pdb | pdbtools delhetatm | pdbtools tidy > 1brs_AD_noHET.pdb
Let me know what you think. Obviously it could be refactored and could share some common functions since there is quite a bit of redundancy.
Best, Joe
Looks nice
But again - on my side I am afraid of breaking a working server machinery and having to refactor all our tutorials. Next to having to type more characters at the end… I don’t see the benefit of it
Hi @jyesselm
The first part of your implementation looks nice but you don't need to add the functions manually to your pdbtools.py
. You would need to import the functions from the specific scripts and used them based on the sys.argv
. Or write a kind of an engine to import them on request. A dictionary based engine would also be great :smiling_imp:
Overall, I understand the concerns of @jyesselm but I agree with @amjjbonvin. I believe implementing sub commands right now would create more problems than the ones it would solve. Also when piping you don't save much, only change a _
for a <space>
.
my 5 cents
I understand the idea of what you are trying to do but I have other PDB related tools and this package pollutes the env a lot
I've heard other people say that pdb-tools causes this env polution and I too suffer from this, not really an issue but can be annoying at times.
I've circumvented this by not running the installation, simply cloning the repository and doing a full call (also on my pipelines).
$ export PDBTOOLS=/home/user/pdb-tools/pdbtools/
$ python $PDBTOOLS/pdb_chain.py
Thanks for all your feedback.
I, personally, never thought that of a problem. But if this is a problem for most users we may really think about doing two different interfaces, one for high performance pipelines and other for users. The approach proposed by @jyesselm was the one I though initially; we can simulate argparse
without using it. But, as @amjjbonvin said, this breaks all tutorials and pipelines so far.
Suggestion
We can make an option during installation, for the user to select if to install independent CLI
s or subcommands. That should be straightforward. What do you all think?
@rvhonorato This is a neat suggestion. I totally get that my proposed suggestion will break a lot of things so I am fine with just closing the issue. I will probably finish this implementation but I will use click and thus break your setup. Thanks a ton for answering my questions and listening to my feedback.
Hi @jyesselm , please don't surrender yet :smile: I opened a new PR #103 . Please all, have a look and let me know your opinion.
As I mentioned in the #103, while I agree that using sub-commands could be an interesting idea for a future major version of the tools, it's too complex to include now and creates far more problems than it solves. Forking user interfaces will also lead to confusion: a script written by someone that uses pdb_selchain
will not work if someone installs with subcommands for instance.
I really appreciate the idea, the discussion, and the feedback but it's simply too big of a change for now. It's definitely something we will keep in mind when we move to a new major version.
I'll close this for now and add a tag to make sure we remember it in the future!
This is a great set of tools but have you considered doing sub commands i.e. pdb_tools seg instead of pdb_seg. I understand the idea of what you are trying to do but I have other PDB related tools and this package pollutes the env a lot