Open kjossul opened 5 years ago
I don't know about .data format, but a .csv file is just lines of comma separated values:
FILE * fp = fopen(filename, "r");
Array<T> out;
char buffer[256];
T outvar;
while (fgets(buffer, 256, fp))
sscanf(buffer, format, &outvar),
out.push(outvar);
fclose(fp)
.data
files are similar as well. From what I've seen, most datasets have a class
in the list of attributes (see iris dataset), meaning that we can use just the other attributes to compute the clustering and we can use the class attribute to check algorithm accuracy and performance.
It would be nice to test against a real dataset instead of random generated points. Most of them have
.data
or.csv
extensions. A nice list can be found here. EDIT: a list of datasets for clustering is here!