skearnes / color-features

ROCS-Derived Features for Virtual Screening
http://arxiv.org/abs/1606.01822
BSD 3-Clause "New" or "Revised" License
5 stars 2 forks source link

Data package #3

Open stochasticsdrj opened 6 years ago

stochasticsdrj commented 6 years ago

Good day, Could you please provide the data.tar.gz file, which presumably contains the pdb/sdf/mol2? files for each of the ids listed in '-datasets.txt'. We are getting the following error message when running your code for the Data Analysis - just Tanimoto data tables with muv were listed in our running code. "Traceback (most recent call last): File "/app/ROCS_Stanford/color-features-master/paper/code/analysis.py", line 339, in main() File "/app/ROCS_Stanford/color-features-master/paper/code/analysis.py", line 284, in main assert FLAGS.cycle AssertionError"

Thank you! stochasticsdrj

skearnes commented 6 years ago

This is not a data error; the AssertionError is being raised because you have not set the --cycle flag when running analysis.py.

stochasticsdrj commented 6 years ago

Thanks Steven. We are still missing the data.tar.gz?

skearnes commented 6 years ago

That's a typo in the README. I think the paper/data directory is an extracted version of that tarball; you shouldn't need anything except the .txt and .pkl files that are already there.

stochasticsdrj commented 6 years ago

Hi Steven, We first attempted in python3.5 after modifying your analysis.py to make it compatible...that didn't work. We then attempted running it in python2.7 and got the following error (similar to what we had previously) $python paper/code/analysis.py --cycle --root data-tversky --dataset_file paper/data/dude-datasets.txt --prefix dude -----error ... INFO:root:xiap 0 INFO:root:Saving processed data to dude-processed.pkl.gz Traceback (most recent call last): File "paper/code/analysis.py", line 339, in main() File "paper/code/analysis.py", line 319, in main mask = data['fold'] == 'all' File "/app/anaconda3/envs/Chemistry27/lib/python2.7/site-packages/pandas/core/frame.py", line 2139, in getitem return self._getitem_column(key) File "/app/anaconda3/envs/Chemistry27/lib/python2.7/site-packages/pandas/core/frame.py", line 2146, in _getitem_column return self._get_item_cache(key) File "/app/anaconda3/envs/Chemistry27/lib/python2.7/site-packages/pandas/core/generic.py", line 1842, in _get_item_cache values = self._data.get(item) File "/app/anaconda3/envs/Chemistry27/lib/python2.7/site-packages/pandas/core/internals.py", line 3843, in get loc = self.items.get_loc(item) File "/app/anaconda3/envs/Chemistry27/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2527, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'fold' ----end of error

Any suggestions and thanks for your help, drj

stochasticsdrj commented 6 years ago

sorry, forgot to highlight the line, mask = data['fold'] == 'all'

skearnes commented 6 years ago

ok, i'll take a closer look and get back to you this week

skearnes commented 6 years ago

So it looks like I only uploaded the processed data (e.g. per-fold ROC AUC and ROC enrichment scores) and not the raw model output. You can generate the data tables by adding the --reload command to the analysis.py script. For example:

python paper/code/analysis.py \
  --root data-tversky \
  --dataset_file paper/data/dude-datasets.txt \
  --prefix dude \
  --reload paper/data/dude-processed.pkl.gz

Note that I made a small change to the analysis.py script to account for backward compatibility in https://github.com/skearnes/color-features/commit/a6af3686c82a5d1d6b68341fe5e5b16e8e4ed356.