uncharted-distil / distil

An analytic workbench for user-guided development of model pipelines
Apache License 2.0
13 stars 3 forks source link

NK Dataset summary integration #293

Closed cdbethune closed 6 years ago

cdbethune commented 6 years ago

Look into the NK summary analytic. Info from Paul:

To get back a string sentence description of the data set, just post an opened file to http://10.108.4.42:5001 (requires being on d3m vpn, as this is the OpenStack server) (edited) [5:56 PM] Alternatively, if you want to build your own image is slightly more complicated, as it requires a word2vec model that is approximately 10GB in size (edited) [5:58 PM] our registry was not big enough to host the complete image (~12GB), but one can simply torrent the word2vec model and build locally using the following repository: https://github.com/NewKnowledge/duke [5:58 PM] it is not public yet, but I think Uncharted and New Knowledge have shared access? [5:59 PM] see test.py in that repository for exact info on how to interact with the docker image (i.e., post the file to it...)

First step is to see what it produces with some of our existing datasets.

phorne-uncharted commented 6 years ago

The endpoint is http://10.108.4.42:5001/fileUpload

A sample request & response:

curl -F file=@/home/phorne/data/d3m_new/185_baseball/185_baseball_dataset/tables/learningData.csv http://10.108.4.42:5001/fileUpload
"This dataset is about BaseballPlayer"
cdbethune commented 6 years ago

Proceeding with integration for Jan. eval.

cdbethune commented 6 years ago

This should be available in ES now - @kbirk synch up with @phorne-uncharted to figure out how to get it out and add it to the summary text.