mbernico / snape

Snape is a convenient artificial dataset generator that wraps sklearn's make_classification and make_regression and then adds in 'realism' features such as complex formating, varying scales, categorical variables, and missing values.
Apache License 2.0
165 stars 21 forks source link

Issue/3/image dataset generation #12

Closed slizb closed 7 years ago

slizb commented 7 years ago

This PR is for issue #3 . The main contribution is make_image_dataset.py, which offers the same json / py dict user interface as make_dataset.py Lemme know what you think. :)

mbernico commented 7 years ago

Hey @slizb it looks like your travis build failed. Here's the errors/test cases. Can you fix please?

======================================================================
ERROR: snape.test.test_make_image_dataset.TestImageNet.test_retrieve_class_counts
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/build/mbernico/snape/snape/test/test_make_image_dataset.py", line 102, in test_retrieve_class_counts
    class_counts = self.image_net.retrieve_class_counts()
  File "/home/travis/build/mbernico/snape/snape/make_image_dataset.py", line 118, in retrieve_class_counts
    soup = BeautifulSoup(request.text, "xml")
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/beautifulsoup4-4.5.3-py2.7.egg/bs4/__init__.py", line 165, in __init__
    % ",".join(features))
FeatureNotFound: Couldn't find a tree builder with the features you requested: xml. Do you need to install a parser library?
-------------------- >> begin captured logging << --------------------
requests.packages.urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): www.image-net.org
requests.packages.urllib3.connectionpool: DEBUG: http://www.image-net.org:80 "GET /api/xml/ReleaseStatus.xml HTTP/1.1" 200 1739158
--------------------- >> end captured logging << ---------------------
======================================================================
FAIL: snape.test.test_make_image_dataset.TestImageNet.test_get_images
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/build/mbernico/snape/snape/test/test_make_image_dataset.py", line 72, in test_get_images
    assert class1_size == n_images, "Did not download n images"
AssertionError: Did not download n images
    './test_images/n03379051' = {'random_seed': 42, 'image_source': 'imagenet', 'out_path': './test_images/', 'weights': [0.8, 0.2], 'n_classes': 2, 'n_samples': 11}["out_path"] + <module 'os' from '/home/travis/miniconda/envs/testenv/lib/python2.7/os.pyc'>.listdir({'random_seed': 42, 'image_source': 'imagenet', 'out_path': './test_images/', 'weights': [0.8, 0.2], 'n_classes': 2, 'n_samples': 11}["out_path"])[0]
    2 = len(<module 'os' from '/home/travis/miniconda/envs/testenv/lib/python2.7/os.pyc'>.listdir('./test_images/n03379051'))
    8 = int({'random_seed': 42, 'image_source': 'imagenet', 'out_path': './test_images/', 'weights': [0.8, 0.2], 'n_classes': 2, 'n_samples': 11}["n_samples"] * {'random_seed': 42, 'image_source': 'imagenet', 'out_path': './test_images/', 'weights': [0.8, 0.2], 'n_classes': 2, 'n_samples': 11}["weights"][0])
>>  assert 8 == 2, "Did not download n images"
slizb commented 7 years ago

It appears travis can't find the lxml package... I'll try adding it to the install_requires list.

slizb commented 7 years ago

the lxml dependency is resolved, but another bug persists. I'll check it out tomorrow.

tgsmith61591 commented 7 years ago

It's not the requirements.txt that governs that. In build_tools/travis/install.sh, you'll have to add the pip install directive

On Mar 22, 2017 7:59 PM, "slizb" notifications@github.com wrote:

It appears travis can't find the lxml package... I'll try adding it to the install_requires list.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mbernico/snape/pull/12#issuecomment-288585889, or mute the thread https://github.com/notifications/unsubscribe-auth/AF10oqtv1HnA-7KpChUSBGS20bEo0j-3ks5rocPdgaJpZM4MhGbO .

slizb commented 7 years ago

@tgsmith61591 I actually just updated the install_requires list in setup.py ...that seemed to do the trick for lxml. Do I need to update build_tools/travis/install.sh too, or would that just be redundant?

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-1.7%) to 94.201% when pulling 048eecbd19a3bea14f8ce85d352f6f1ec90988e4 on slizb:issue/3/image_dataset_generation into 66864b50dabff2fbc950b5fced9eec6de4420ef7 on mbernico:master.

slizb commented 7 years ago

passing travis ci now. coverage ticked down by 1.7% Let me know if that's unacceptable, otherwise i think this is ready for review @mbernico