mtreinish / ciml

a machine learning pipeline for analyzing CI results.
Apache License 2.0
5 stars 4 forks source link

collecting data #60

Closed jlousada315 closed 4 years ago

jlousada315 commented 4 years ago

can you send me the .zip file of the tempest-full data you use to obtain your results ? I just can't surpass that step and I don't know why :)

the email is joao.lousada1_at_gmail.com

thank you, I will try to see why I can't collect the data independently anyway. But right now I really need to take a look at the dataset and see what features you use and test my own algorithms.

afrittoli commented 4 years ago

Hello @johnnylousas - the dataset is public - so I think it would be more useful if you were able to download it and keep it in sync to your needs - than rely on a snapshot.

What does seem to be the problem with syncing the data?

jlousada315 commented 4 years ago

Hello @afrittoli .

After running the example you have on README file: Inside the folder data, I have is a folder called _splits and another one called tempest-full. Inside the latter I have 3 folders dev.json.gz testing.json.gz and training.json.gz . Which is fine, but the data I get is a list of elements like this one: "d44a7f8e-0d56-404a-ad55-fd8bd99ae789"

and I dont know what it means so it is not very useful. Moreover, there aren't many elements, the training set size is small.

jlousada315 commented 4 years ago

When I try to run:

dataset = 'tempest-full' labels = gather_results.load_dataset(dataset, 'labels')['labels'] training_data = gather_results.load_dataset(dataset, 'training') test_data = gather_results.load_dataset(dataset, 'test')

I get the error that there are no .npz files (and there aren't)

jlousada315 commented 4 years ago

Ok it makes sense. The problem is at ciml-build-dataset --dataset tempest-full --build-name tempest-full. I get this exception:

Obtained 558 runs for build tempest-full Traceback (most recent call last): File "/usr/local/bin/ciml-build-dataset", line 10, in sys.exit(build_dataset()) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 555, in invoke return callback(args, **kwargs) File "/Users/joaolousada/Documents/5ºAno/Master-Thesis/ciml-master/ciml/trainer.py", line 562, in build_dataset data_path=data_path, s3=s3) File "/Users/joaolousada/Documents/5ºAno/Master-Thesis/ciml-master/ciml/trainer.py", line 257, in data_sizes_and_labels filtered_sample_result = filter_example(sample_result, features_regex) File "/Users/joaolousada/Documents/5ºAno/Master-Thesis/ciml-master/ciml/trainer.py", line 164, in filter_example col_regex = re.compile(features_regex) File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 234, in compile return _compile(pattern, flags) File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 285, in _compile raise TypeError("first argument must be string or compiled pattern") TypeError: first argument must be string or compiled pattern

kwulffert commented 4 years ago

@johnnylousas, did you run the cache step previous to the build one? was it successful?

jlousada315 commented 4 years ago

@kwulffert yes the build is successful. Thank you

kwulffert commented 4 years ago

@johnnylousas can we close this one?

jlousada315 commented 4 years ago

yes sure!