skinniderlab / CLM

MIT License
0 stars 0 forks source link

Consider refactoring function #174

Closed vineetbansal closed 2 weeks ago

vineetbansal commented 1 month ago

Function prep_outcomes_freq works differently based on the dtypes of samples. This can potentially be refactored (have two functions instead of one, or have an additional helper function), to avoid checking isinstance (confusing IMO).

    # Samples can be a list of csv files or a dataframe
    if isinstance(samples, str):
        data = pd.concat(
            [
                read_csv_file(sample, usecols=["smiles", "size"])
                for sample in [samples, known_smiles, invalid_smiles]
            ]
        )
    else:
        data = samples