openml / server-api

Python-based server
https://openml.github.io/server-api/
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

GET /data/qualities/list #11

Closed PGijsbers closed 10 months ago

PGijsbers commented 10 months ago

Selects all qualities from quality by name, filtering out only the type of DataQuality which are used at least once in data_quality. In PHP:

  function allUsed() {
    // this query selects only the data qualities that are actually used at least once
    $sql = '
      SELECT `q`.`name`, count(*) AS `number` 
      FROM `quality` `q`, `data_quality` `dq` 
      WHERE `q`.`type`= "DataQuality" 
      AND `q`.`name` = `dq`.`quality` 
      GROUP BY `q`.`name`';
    return $this->query( $sql );

  }

This begs the question: why not querySELECT DISTINCT(quality) FROM data_quality;? The returned number is never used as far as I can tell.

PGijsbers commented 10 months ago

As far as Jan is aware, the count is never used. Using the distinct query is fine. He does not know why there is a distinction on used qualities vs defined qualities.