marrink-lab / bentopy

Packs stuff in boxes
4 stars 0 forks source link

Add grocat subcommand #21

Closed ma3ke closed 3 months ago

ma3ke commented 3 months ago

The current implementation is somewhat limited. We only allow writing to files that are seekable (so stdout, for instance, is rejected as an output file).

It is reasonably fast, I think. Catting two 592 MB gro files takes around three seconds within the actual grocat command on my setup. I'm going to try to make it work with the files opened in byte mode in an upcoming commit here.

However, I think the performance of overall use is extremely poor at the moment. The reason for this appears to be the overhead of loading the bentopy modules in src/bentopy/__init__.py. Good target for a future issue and pull request. For now I'll keep my eyes on getting this to work.

@jan-stevens, what do you think about the command and its interface? Is it helpful to you, and does it feature everything that you were looking for? (Probably forgot about something ;)

ma3ke commented 3 months ago

It would be very nice to have access to some ways of modifying the sequence of the segments and manipulating the residue numbers and names as we tear through the files. A naive implementation of that could work well enough, but ideally we'd do that with a different toolset.

Worth considering, but for now I think this command is mostly just ready to use. Merge already?

jan-stevens commented 3 months ago

A nice functionality would be to have a residue rename feature like: chromosome.gro:CHROM. Having a more mesoscale annotation of the files is convenient for visualization.

ma3ke commented 3 months ago

The following is now possible:

$ bentopy grocat 3lyz_cube.gro:A 3lyz_none.gro:B -o catted.gro
Reading from 3lyz_cube.gro... Replacing resnames with 'A'.
Reading from 3lyz_none.gro... Replacing resnames with 'B'.
$ head catted.gro 
render-placements
26309148                        
    0    A    N    1   3.144  89.731  35.311
    0    A   CA    2   3.063  89.745  35.189
    0    A    C    3   3.069  89.897  35.173
    0    A    O    4   3.082  89.972  35.272
    0    A   CB    5   2.923  89.689  35.209
    0    A   CG    6   2.817  89.749  35.118
    0    A   CD    7   2.678  89.700  35.155
    0    A   CE    8   2.584  89.701  35.036
$ tail catted.gro
    0    B    O 1095   8.797  37.676  19.969
    0    B    O 1096   9.233  37.313  21.581
    0    B    O 1097   9.675  37.804  19.301
    0    B    O 1098  10.252  37.482  20.745
    0    B    O 1099  10.432  37.651  20.830
    0    B    O 1100  10.542  37.832  20.844
    0    B    O 1101  10.283  37.578  21.170
    0    B    O 1102  10.024  37.224  21.183
    0    B    O 1103   9.610  37.105  21.115
100.0 100.0 100.0

Note that it is still possible to pass files without the :<resname> notation. In that case the resname is left as it was.

ma3ke commented 3 months ago

Finally, I made the resname left-aligned, rather than right-aligned as shown in the above example.

-     0    A    N    1   3.144  89.731  35.311
+     0A        N    1   3.144  89.731  35.311