openml / OpenML

Open Machine Learning
https://openml.org
BSD 3-Clause "New" or "Revised" License
668 stars 91 forks source link

Inconsistent config filename, param names and default values #895

Open nok opened 5 years ago

nok commented 5 years ago

The loaded config and named parameters are inconsistent over all projects. Shouldn't they be standardised 🤔? If so, what's the right filename and params?

What do you think?


R Path (saveOMLConfig.R#L18): ~/.openml/config Params (config_helpers.R#L20-L25):

server = "http://www.openml.org/api/v1",
cachedir = file.path(tempdir(), "cache"),
verbosity = 1L,
arff.reader = "farff",
apikey = "",
confirm.upload = TRUE

Python Path (config.py#L26): ~/.openml/config Params (config.py#L19-L23):

'apikey': None,
'server': "https://www.openml.org/api/v1/xml",
'verbosity': 0,
'cachedir': os.path.expanduser('~/.openml/cache'),
'avoid_duplicate_runs': 'True',

Java Path (saveOMLConfig.R#L18): ~/.openml/openml.conf Params:


The following projects doesn't support a config file:

mfeurer commented 5 years ago

Hey, the configuration file is defined here. Apparently, the page does not define where the configuration file should reside and what it's name should be. The cache directory can be specified within the configuration file and should be ~/.openml/cache by default.

@janvanrijn @giuseppec we should define the file name and standard location as well. How about we first assume that the file is called ~/.openml/config and if that doesn't exist, we try ~/.openml/openml.conf?

nok commented 5 years ago

What about a simple JSON (~/.openml/config.json)? And what do you think about the following suggestions?

{
    "apikey": "",                        <-- str, default: ""
    "server": "www.openml.org",          <-- str, without http scheme
    "scheme": "https",                   <-- str, option ["https", "http], default: "https"
    "basepath": "/api/v1",               <-- str, with leading slash
    "cachedir": "cache",                 <-- relative path to the system tmp dir
    "verbosity": 0,                      <-- int, option [0, 1, 2], default: 0
    "extensions": {                      <-- arr, custom settings which will be extended to the first layer
        "openml/openml-weka": {          <-- str, unique name based on <orga_or_user>/<repository_name>
            "arff.reader": "RWeka",
            "confirm.upload": true
        },
        "openml/openml-r": {
            "arff.reader": "farff"
        },
        "openml/openml-python": {
            "avoid_duplicate_runs": true
        },
        "openml/openml-java": {
            "cache_allowed": true,
            "tags": ["a", "b"]
        }
    }
}

Advantages:

mfeurer commented 5 years ago

Thanks for the suggestion. Personally, I'd prefer to have something which is easier to create by hand, for example a yaml file.

nok commented 5 years ago

You can dump a dictionary to JSON or YAML:

import json
import yaml

default_config = {
    "apikey": "",
    "server": "www.openml.org",
    "scheme": "https",
    "basepath": "/api/v1",
    "cachedir": "cache",
    "verbosity": 0,
    "extensions": {
        "openml/openml-python": {
            "avoid_duplicate_runs": True
        }
    }
}

# JSON:
with open('config.json', 'w') as f:
    json.dump(default_config, f, indent=4)
with open('config.json', 'r') as f:
    config = json.load(f, encoding='utf-8')
    print(config.get('server'))

# YAML:
with open('config.yaml', 'w') as f:
    yaml.safe_dump(default_config, f, allow_unicode=True)
with open('config.yaml', 'r') as f:
    config = yaml.load(f)
    print(config.get('server'))

In Python that's really easy. But in other programming languages you have to import libraries which can be a disadvantage for a library.