usc-isi-i2 / dsbox-profiling

The data profiling TA1 component of DSBox
MIT License
5 stars 3 forks source link

Code snippet in README throws an error #14

Open yamsgithub opened 6 years ago

yamsgithub commented 6 years ago

I am running the code from the README but it throws an error:


from dsbox.datapreprocessing.profiler import Profiler import pandas as pd

profiler = Profiler() data = pd.read_csv('test.csv', dtype=object) jsonResult = profiler.produce(inputs=data)

ImportError Traceback (most recent call last)

in () ----> 1 from dsbox.datapreprocessing.profiler import Profiler 2 import pandas as pd 3 4 profiler = Profiler() 5 data = pd.read_csv('/Users/yamuna/D3M/data/185_baseball/185_baseball_dataset/tables/learningData.csv', dtype=object) ImportError: cannot import name Profiler
kyao commented 6 years ago

It looks like Python cannot find the Profiler.

We need to update our readme file.

To install our profiler, please clone the master branch. Then, cd to your local repo directory, and do:

pip install -e .

Also, for sample usage you can take a look at:

https://github.com/usc-isi-i2/dsbox-profiling/blob/master/ta1-pipeline.py

yamsgithub commented 6 years ago

I am running the ta1-pipeline.py but gettting the following error. Where are the config files?

FileNotFoundError Traceback (most recent call last)

in () 19 20 # Load the json configuration file ---> 21 with open("ta1-pipeline-config.json", 'r') as inputFile: 22 jsonCall = json.load(inputFile) 23 inputFile.close() FileNotFoundError: [Errno 2] No such file or directory: 'ta1-pipeline-config.json'
kyao commented 6 years ago

NIST provides that file when we submit our pipelines for testing. That file looks

{ "train_data": "/path-to-data/seed_datasets_current/38_sick/TRAIN", "test_data": "/path-to-data/seed_datasets_current/38_sick/TEST", "output_folder": "." }

yamsgithub commented 6 years ago

Thanks. I was able to run it. I am using the following config.json: { "train_data": "/Users/yamuna/D3M/data/196_autoMpg/TRAIN", "test_data": "/Users/yamuna/D3M/data/196_autoMpg/TEST", "output_folder": "." }

But now get this error:

python ta1-pipeline.py > profiling_output.log Traceback (most recent call last): File "ta1-pipeline.py", line 57, in ds2.metadata.pretty_print() File "/Users/yamuna/D3M/ta2/src/d3m-metadata/d3m_metadata/metadata.py", line 646, in pretty_print self.pretty_print(selector + [element], handle=handle, _level=_level + 1) File "/Users/yamuna/D3M/ta2/src/d3m-metadata/d3m_metadata/metadata.py", line 635, in pretty_print self.pretty_print(selector + [ALL_ELEMENTS], handle=handle, _level=_level + 1) File "/Users/yamuna/D3M/ta2/src/d3m-metadata/d3m_metadata/metadata.py", line 646, in pretty_print self.pretty_print(selector + [element], handle=handle, _level=_level + 1) File "/Users/yamuna/D3M/ta2/src/d3m-metadata/d3m_metadata/metadata.py", line 625, in pretty_print for line in json.dumps(query(selector=selector), indent=1, cls=MetadataJsonEncoder).splitlines(): File "/Users/yamuna/anaconda3/lib/python3.6/json/init.py", line 238, in dumps **kw).encode(obj) File "/Users/yamuna/anaconda3/lib/python3.6/json/encoder.py", line 201, in encode chunks = list(chunks) File "/Users/yamuna/anaconda3/lib/python3.6/json/encoder.py", line 438, in _iterencode yield from _iterencode(o, _current_indent_level) File "/Users/yamuna/anaconda3/lib/python3.6/json/encoder.py", line 430, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/Users/yamuna/anaconda3/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict yield from chunks File "/Users/yamuna/anaconda3/lib/python3.6/json/encoder.py", line 437, in _iterencode o = _default(o) File "/Users/yamuna/D3M/ta2/src/d3m-metadata/d3m_metadata/metadata.py", line 182, in default return super().default(o) File "/Users/yamuna/anaconda3/lib/python3.6/json/encoder.py", line 180, in default o.class.name) TypeError: Object of type 'int64' is not JSON serializable

kyao commented 6 years ago

We submitted a fix to this bug. Are you using an older version of d3m metadata? Try using the version with tag v2018.1.26