steineggerlab / foldcomp

Compressing protein structures effectively with torsion angles
GNU General Public License v3.0
163 stars 14 forks source link

foldcomp compress breaks input pdb file into multiple output files #47

Open vagkaratzas opened 11 months ago

vagkaratzas commented 11 months ago

Hi!

When using foldcomp compress on this file: https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/proteomics/pdb/1tim.pdb it breaks into 4 parts: 1tim.pdbA_0.fcz, 1tim.pdbA_1.fcz, 1tim.pdbB_0.fcz and 1tim.pdbB_1fcz

Is this functionality desired?

How to decompress into one pdb file afterwards with foldcomp decompress?

Thanks for the help :)

khb7840 commented 11 months ago

Current version splits PDB files with multiple chains into multiple FCZ files. I think your point makes sense and current we don't have built-in function for merging output into one file. You may concatenate decompressed output on your own. I'll work on to implement that functionality in the near future. Thank you for your feedback!

vagkaratzas commented 11 months ago

Thanks! I think it would make sense to rename the outputs to something like 1tim.pdb-A_0.fcz or 1tim.pdb_A_0.fcz, etc, in order to be able to split with a character and match back to the original file name, to join chains of the similar structure while decompressing.