pgmikhael / clipzyme

Reaction-Conditioned Virtual Screening of Enzymes
Apache License 2.0
15 stars 1 forks source link

Question about some file path #2

Closed zw-SIMM closed 1 month ago

zw-SIMM commented 1 month ago

Nice work and beautiful codes! But some question about file path. What is the file alphafold_enzymes.p and uniprot2sequence_standard_set_structs.p in clipzyme/datasets/enzyme_screening.py ? How to generate these files?


     self.alphafold_files = pickle.load(open("/home/datasets/alphafold_enzymes.p", "rb"))

....

    @staticmethod
    def set_args(args) -> None:
        args.dataset_file_path = (
            "/home/datasets/uniprot2sequence_standard_set_structs.p"
        )

Additionally, args.load_wln_cache_in_dataset seems missing.

pgmikhael commented 1 month ago

Hi,

Thank you for your interest!

Here,

Thanks for pointing out the --load_wln_cache_in_dataset flag. This should be False / not used. We will fix it in the next code update as well!

zw-SIMM commented 1 month ago

Hi,

Thank you for your interest!

Here,

  • alphafold_enzymes.p would be a set of protein (UniProt) IDs that have an alpha-fold structure saved locally (this is only used in the method to save time from checking whether the path exists with something like skip_sample``os.path.exists)
  • uniprot2sequence_standard_set_structs.p: is a dict mapping a protein ID to its sequence, similar to the file.uniprot2sequence.p

Thanks for pointing out the flag. This should be False / not used. We will fix it in the next code update as well!--load_wln_cache_in_dataset

Many thanks for your reply! The simplest method seems to comment them in load_dataset method and skip_samplemethod ?

pgmikhael commented 1 month ago

Yes I think that should work. I'll also just clean up because some code is unused in the final version.