incorporating train dataset from individual tc_files to overall tc file

PR #150 aggregated the results from individual CV folds for train and incorporated it in *_all_{metric}_CV_ranks_structure.csv flies but missed it on *_all_{metric}_CV_tc.csv files. We failed to notice this because we weren't using the tc.csv file until now.

The test files also don't reflect the missing SMILES for source=train because we haven't updated those in a while.

A particular run of the pipeline on della seem to have train SMILES incorporated on *min{min_freq}_all_{metric}_CV_tc.csv.gz.

/scratch/gpfs/vineetb/clm/out/ped_backup_04062024/ to be exact.

This is baffling me because min_freq (PR #172) is something we merged after aggregating CV folds (PR #150). Also checking out to a previous branch that introduce min_freq doesn't include train SMILES on the all_tc_file.

skinniderlab / CLM

incorporating train dataset from individual tc_files to overall tc file #209