haddocking / pdb-tools

A dependency-free cross-platform swiss army knife for PDB files.
https://haddocking.github.io/pdb-tools/
Apache License 2.0
372 stars 113 forks source link

sub commands? #102

Closed jyesselm closed 3 years ago

jyesselm commented 3 years ago

This is a great set of tools but have you considered doing sub commands i.e. pdb_tools seg instead of pdb_seg. I understand the idea of what you are trying to do but I have other PDB related tools and this package pollutes the env a lot

image

joaomcteixeira commented 3 years ago

Hi @jyesselm

I believe the main reason we have not setup sub commands is because that would require additional overhead to run each operation and many of these operations are executed in high throughput pipelines. That is also the reason why we don't use argparse.

We can do sub commanding without using argparse and just using built-ins. It would require the time to execute an if statement, which is in the nanosecond range. We can make a vote among developers and create a new branch to explore this idea.

What do you think? @JoaoRodrigues @amjjbonvin @mtrellet @brianjimenez

amjjbonvin commented 3 years ago

Wouldn’t this introduce an overhead when we are building pipelines? (and also longer commands at the end)

jyesselm commented 3 years ago

Have you looked into doing this with click. It doesnt seem to bad. If you wouldnt mind at some point I would like to demo it with the most used commands. I can create a fork and show you. I feel the overhead will be minimal but happy to test this.

amjjbonvin commented 3 years ago

Hi Joseph

PDB-tools are also used in our web portal. I don’t see making changes that would break a working machinery, to solve conflicts on your side :-) But you can of course try to convince us.

Those scripts have a long history (were initially good old fortran code) and have been doing the job for decades.

Cheers Alexandre

jyesselm commented 3 years ago

Sure completely understood! I thought I would throw it out there. Is there any issue of me making a fork for my own purposes? With credit to you fine folks.

Best, Joe

amjjbonvin commented 3 years ago

Hi Joe

Our license allows for that. So go for it! And may-be we will adopt your changes in some future. But the real developers / coders should give their opinion here :-)

Cheers Alexandre

amjjbonvin commented 3 years ago

PS: And a demo at some point would of course be nice.

JoaoRodrigues commented 3 years ago

Hi Joe,

Thank you so much for raising this issue and for proposing the subcommands.

When I first wrote pdbtools, I had that design in mind but abandoned it because of performance/dependencies. One of the main features of pdbtools is that they have zero dependencies outside the standard library, so no click allowed :) We did look at argparse but we were losing a lot of execution time. Since in-house we used the scripts to process hundreds/thousands of structures, a 2-5x performance drop was very noticeable!

If you find a way to reproduce subcommands using only sys.argv, we'd be happy to consider the proposal! My guess is that you'd have an entry script and then dispatch the input/options to the functions we need from each script. Not very trivial ..

Cheers

João

A quarta, 5/05/2021, 13:33, Alexandre Bonvin @.***> escreveu:

PS: And a demo at some point would of course be nice.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haddocking/pdb-tools/issues/102#issuecomment-832988047, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZ6EDSPOEGSEMU4RGXOGLTMGTQVANCNFSM44DZLXNQ .

jyesselm commented 3 years ago

I see this makes a lot of sense. I do not have this use case. But this I can understand that if you need to run the script thousands of times that would add up. I feel this could be solved by having a batch mode but that breaks your cmd style. Well if you can leave this request open I will try and play around with it and link to my solution if I find one. Thanks again for answering my questions.

JoaoRodrigues commented 3 years ago

Sure, happy to review your implmentation!

Batch mode is hard with pipes :)

A quarta, 5/05/2021, 17:33, Joseph Yesselman @.***> escreveu:

I see this makes a lot of sense. I do not have this use case. But this I can understand that if you need to run the script thousands of times that would add up. I feel this could be solved by having a batch mode but that breaks your cmd style. Well if you can leave this request open I will try and play around with it and link to my solution if I find one. Thanks again for answering my questions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haddocking/pdb-tools/issues/102#issuecomment-833140717, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZ6ECZ62EPDNOTJISOQCDTMHPT7ANCNFSM44DZLXNQ .

jyesselm commented 3 years ago

So here is the demo.

https://github.com/jyesselm/pdb-tools

Seems to work as intended and no large speed hit. ( do not think the speeds are really worth anything at this amount time I reran it a few times and got the same speed as the original commands )

pdb_selchain -A,D 1brs.pdb | pdb_delhetatm | pdb_tidy > 1brs_AD_noHET.pdb

can be replaced with

pdbtools chainsel -A,D 1brs.pdb | pdbtools delhetatm | pdbtools tidy > 1brs_AD_noHET.pdb

image

image

Let me know what you think. Obviously it could be refactored and could share some common functions since there is quite a bit of redundancy.

Best, Joe

amjjbonvin commented 3 years ago

Looks nice

But again - on my side I am afraid of breaking a working server machinery and having to refactor all our tutorials. Next to having to type more characters at the end… I don’t see the benefit of it

joaomcteixeira commented 3 years ago

Hi @jyesselm

The first part of your implementation looks nice but you don't need to add the functions manually to your pdbtools.py. You would need to import the functions from the specific scripts and used them based on the sys.argv. Or write a kind of an engine to import them on request. A dictionary based engine would also be great :smiling_imp:

Overall, I understand the concerns of @jyesselm but I agree with @amjjbonvin. I believe implementing sub commands right now would create more problems than the ones it would solve. Also when piping you don't save much, only change a _ for a <space>.

my 5 cents

rvhonorato commented 3 years ago

I understand the idea of what you are trying to do but I have other PDB related tools and this package pollutes the env a lot

I've heard other people say that pdb-tools causes this env polution and I too suffer from this, not really an issue but can be annoying at times.

I've circumvented this by not running the installation, simply cloning the repository and doing a full call (also on my pipelines).

$ export PDBTOOLS=/home/user/pdb-tools/pdbtools/
$ python $PDBTOOLS/pdb_chain.py
joaomcteixeira commented 3 years ago

Thanks for all your feedback. I, personally, never thought that of a problem. But if this is a problem for most users we may really think about doing two different interfaces, one for high performance pipelines and other for users. The approach proposed by @jyesselm was the one I though initially; we can simulate argparse without using it. But, as @amjjbonvin said, this breaks all tutorials and pipelines so far.

Suggestion We can make an option during installation, for the user to select if to install independent CLIs or subcommands. That should be straightforward. What do you all think?

jyesselm commented 3 years ago

@rvhonorato This is a neat suggestion. I totally get that my proposed suggestion will break a lot of things so I am fine with just closing the issue. I will probably finish this implementation but I will use click and thus break your setup. Thanks a ton for answering my questions and listening to my feedback.

joaomcteixeira commented 3 years ago

Hi @jyesselm , please don't surrender yet :smile: I opened a new PR #103 . Please all, have a look and let me know your opinion.

JoaoRodrigues commented 3 years ago

As I mentioned in the #103, while I agree that using sub-commands could be an interesting idea for a future major version of the tools, it's too complex to include now and creates far more problems than it solves. Forking user interfaces will also lead to confusion: a script written by someone that uses pdb_selchain will not work if someone installs with subcommands for instance.

I really appreciate the idea, the discussion, and the feedback but it's simply too big of a change for now. It's definitely something we will keep in mind when we move to a new major version.

I'll close this for now and add a tag to make sure we remember it in the future!