Closed FebsN0 closed 2 years ago
Ok, sorry, I have solved. I didn't figured out that that file is actually this https://files.docking.org/zinc20-ML/fingerprints_count.txt the file name was different.
Mol_ct_file_x.csv, where x is the name of your project, is generated in molecular_file_count_updated.py, line 70. The "updated" file with the number of molecules to sample per smiles file is then generated from it (see lines after 70 in the same script). This is done in your fingerprint directory in the first iteration, and in the morgan_1024_predictions folder of the previous iteration from iteration 2.
Oh, perfect! I don't know why by running that previous code, Mol_ct_file_x.csv was not generated. Thank you very much!
https://github.com/jamesgleave/DD_protocol/blob/9c842f6d946a97a5018c899993c70712cbc095fe/scripts_2/simple_job_models.py#L80
Hello, I don't understand where "Mol_ctfile%s.csv" is generated. I have checked all your scripts and I didn't find any command that create such file. Should the variable t_mol be just the total number of compounds in the general database (i.e. ZINC20) ? Supposing that such database has 1 billion, the t_mol is supposed to be 1000? If not, why then divide by 1 million?