steineggerlab / foldcomp

Compressing protein structures effectively with torsion angles
GNU General Public License v3.0
145 stars 14 forks source link

Dealing with nested subdirectories #40

Open jomimc opened 8 months ago

jomimc commented 8 months ago

I have a lot of data to compress, and they are stored in nested subdirectories (e.g. /Data/Protein/Mutation/...pdb).

Default behavior of "foldcomp compress -r" seems to be to create an output folder, and to put everything in there. So I encounter the "Output file already exists" error.

Is there a way to either create a new directory with the same subdirectory structure? Or to output the ".fcz" files in the same directories as the uncompressed pdb files?

khb7840 commented 8 months ago

I think you can write a script that iterate through nested sub-directories.

#!/bin/bash
# Usage: ./foldcomp_recursive.sh <path> <threads>
threads=$2

function run_command_in_dir {
    for dir in "$1"/*; do
        if [ -d "$dir" ]; then
            run_command_in_dir "$dir"
        fi
    done

    # Check if pdb or cif files exist in the directory
    if ls "$1"/*.pdb 1> /dev/null 2>&1 || ls "$1"/*.cif 1> /dev/null 2>&1; then
        foldcomp compress -t $threads "$1" "$1"
    fi
}

run_command_in_dir "$1"

This one is an example bash script that iterate through the input directory recursively and check if there are pdb or cif files in the directory while compressing if there are wanted files.

jomimc commented 8 months ago

That's what I did, thanks. I managed to get all ~ 60,000 pdb files compressed within an hour.