openml / OpenML

Open Machine Learning
https://openml.org
BSD 3-Clause "New" or "Revised" License
668 stars 91 forks source link

Support for survival tasks #1215

Open jemus42 opened 5 months ago

jemus42 commented 5 months ago

Description

Support for creating / uploading survival tasks is either (partially?) broken or at least insufficiently documented.

Steps/Code to Reproduce

An example survival task I would like to create has this format in xml:

<?xml version="1.0" encoding="UTF-8"?>
<oml:task_inputs xmlns:oml="http://openml.org/openml">
  <oml:task_type_id>7</oml:task_type_id>
  <oml:input name="source_data">46131</oml:input>
  <oml:input name="target_feature_event">status</oml:input>
  <oml:input name="target_feature_left"/>
  <oml:input name="target_feature_right">time</oml:input>
  <oml:input name="estimation_procedure">30</oml:input>
</oml:task_inputs>

based on dataset 46131 (gbsg) with target variable time (right-censored survival time) and status (binary censoring indicator, "event" variable).

I have tried various versions of this xml format, always based on what e.g. mlr3oml generates to publish tasks (which I successfully testes for classification tasks) and what I could find from existing survival tasks.

I attempted to create this task in two ways:

  1. Using R's httr package and code analogous to what mlr3oml uses to upload tasks:
response = httr::POST(
  url = "https://www.openml.org/api/v1/task",
  body = list(
    description = httr::upload_file("task.xml")
  ),
  query = list(api_key = Sys.getenv("OPENMLAPIKEY"))
)
  1. A cURL command which I assume to be equivalent to the code above, but have limited experience with regarding file uploads:
curl -X POST "https://www.openml.org/api/v1/task?api_key=$OPENMLAPIKEY" \
    -H "Accept: application/xml" \
    -F "description=@task.xml;type=application/xml"

Expected Results

I expected to receive a task id of the correctly published survival task.

Actual Results

Versions of this error message:

<oml:error xmlns:oml="http://openml.org/openml">
        <oml:code>619</oml:code>
        <oml:message>Could not decode task inputs constraints json. Please contact developers.</oml:message>
                <oml:additional_information>problematic input: target_feature_event</oml:additional_information>
        </oml:error>

Additional remarks

  1. I have tried to use the same method to create a regression task based on this dataset through the API and I was successful, so I assume that the survival task is the issue at hand
  2. The estimation_procedure I chose is 30, but since these are specific to task types (why?) this is likely incorrect. I have tried a different estimation procedure that corresponds to survival tasks (19), but receive the same error response. 2.1 (I do, however, specifically need stratified 5-fold CV for my tasks)
  3. @sebffischer tried to help me with this, and confirmed for me that technically creating survival tasks works on the test server but not the production server.
  4. As a side-note, when I tried to interactively use the REST API "try it out" feature on the website I also was not successful and from the server response below I assume that something more general was going wrong there, but I concede that I should probably open an issue on the website repo for that.
{
  "error": {
    "code": "611",
    "message": "Description file not present"
  }
}