Closed jlousada315 closed 4 years ago
Hello @johnnylousas - the dataset is public - so I think it would be more useful if you were able to download it and keep it in sync to your needs - than rely on a snapshot.
What does seem to be the problem with syncing the data?
Hello @afrittoli .
After running the example you have on README file: Inside the folder data, I have is a folder called _splits and another one called tempest-full. Inside the latter I have 3 folders dev.json.gz testing.json.gz and training.json.gz . Which is fine, but the data I get is a list of elements like this one: "d44a7f8e-0d56-404a-ad55-fd8bd99ae789"
and I dont know what it means so it is not very useful. Moreover, there aren't many elements, the training set size is small.
When I try to run:
dataset = 'tempest-full' labels = gather_results.load_dataset(dataset, 'labels')['labels'] training_data = gather_results.load_dataset(dataset, 'training') test_data = gather_results.load_dataset(dataset, 'test')
I get the error that there are no .npz files (and there aren't)
Ok it makes sense. The problem is at ciml-build-dataset --dataset tempest-full --build-name tempest-full. I get this exception:
Obtained 558 runs for build tempest-full
Traceback (most recent call last):
File "/usr/local/bin/ciml-build-dataset", line 10, in
@johnnylousas, did you run the cache step previous to the build one? was it successful?
@kwulffert yes the build is successful. Thank you
@johnnylousas can we close this one?
yes sure!
can you send me the .zip file of the tempest-full data you use to obtain your results ? I just can't surpass that step and I don't know why :)
the email is joao.lousada1_at_gmail.com
thank you, I will try to see why I can't collect the data independently anyway. But right now I really need to take a look at the dataset and see what features you use and test my own algorithms.