Closed saehm closed 1 year ago
According to https://enable-cors.org/server_apache.html, it seems to be just a little change in the right .htaccess file. The response header of an API call should contain a "Access-Control-Allow-Origin" to be accepted by a browser, which it does not at the moment.
In example:
curl -v https://api.openml.org/api/v1/json/data/187
returns this response:
< HTTP/1.1 200 OK < Date: Thu, 18 Aug 2022 07:56:19 GMT < Server: Apache/2.4.29 (Ubuntu) < Set-Cookie: ci_session=ko27fr7pguc027468ffe5b3ihtld68lj; expires=Thu, 18-Aug-2022 09:56:19 GMT; Max-Age=7200; path=/; HttpOnly < Expires: Thu, 19 Nov 1981 08:52:00 GMT < Cache-Control: no-store, no-cache, must-revalidate < Pragma: no-cache < Content-Length: 4234 < Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept < Content-Type: application/json; charset=utf-8 < {"data_set_description":{"id":"187","name":"wine","version":"1","description":"**Author**: \n**Source**: Unknown - \n**Please cite**: \n\n1. Title of Database: Wine recognit...
Can you try again, please?
It works now :) Thank you!
@joaquinvanschoren not sure if that is easily doable but would you consider adding a Access-Control-Allow-Origin: *
header to the https://openml.org/api/*
URLs as well?
Right now it does not have the header e.g. this URL:
curl -v https://openml.org/api/v1/json/data/list/data_name/titanic/limit/2/data_version/1 2>&1 | grep Access-Control-Allow-Origin
While the api.openml.org
equivalent URL has the Access-Control-Allow-Origin header:
curl -v https://api.openml.org/v1/json/data/list/data_name/titanic/limit/2/data_version/1 2>&1 | grep Access-Control-Allow-Origin
< Access-Control-Allow-Origin: *
Context: turns out scikit-learn uses https://openml.org/api
URLs (not sure if https://api.openml.org
URLs are preferred, let me know if this is the case ...) and trying to use sklearn.datasets.fetch_openml
inside Pyodide fails with a CORS-related error. It would be great if https://openml.org/api
URLs could have the header. This would allow running scikit-learn gallery examples inside JupyterLite, see https://github.com/scikit-learn/scikit-learn/pull/25887 for more details.
Yes, https://api.openml.org
is preferred (and it will be faster), but I'll try to add the headers.
We have a proxy set up that redirects openml.org/api to api.openml.org but for some reason it strips the headers and I'm not sure why yet.
Ok, it should work now. Please let me know :)
Thanks a lot! It seems to fix the issue with most URLs, although
I am still seeing a CORS issue with data URLs e.g. https://openml.org/data/v1/download/16826755
. Not sure why since there seems to be Access-Control-Allow-Origin: *
in the headers ...
On the other hand, the equivalent api.openml.org
does not have the CORS issue https://api.openml.org/data/v1/download/16826755
.
For the longer term, I'll try to get scikit-learn to use the api.openml.org
URLs.
To reproduce go to https://scikit-learn.org (most websites would do for the reproducer), open your browser console:
The api.openml.org/data
succeeds:
function reqListener() {
console.log(this.responseText);
}
req = new XMLHttpRequest();
req.addEventListener("load", reqListener);
req.open("GET", "https://api.openml.org/data/v1/download/16826755");
req.send();
The openml.org/data
one does not:
function reqListener() {
console.log(this.responseText);
}
req = new XMLHttpRequest();
req.addEventListener("load", reqListener);
req.open("GET", "https://openml.org/data/v1/download/16826755");
req.send();
It looks like somehow there are multiple CORS headers.
The error looks like this on Firefox:
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at
https://www.openml.org/data/v1/download/16826755. (Reason: Multiple CORS header
‘Access-Control-Allow-Origin’ not allowed).
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at
https://www.openml.org/data/v1/download/16826755. (Reason: CORS request did not succeed).
Status code: (null).
And on Chromium:
Access to XMLHttpRequest at 'https://www.openml.org/data/v1/download/16826755' (redirected from
'https://openml.org/data/v1/download/16826755') from origin 'https://scikit-learn.org' has been blocked
by CORS policy: The 'Access-Control-Allow-Origin' header contains multiple values '*, *', but only one
is allowed.
VM36:8 GET https://www.openml.org/data/v1/download/16826755 net::ERR_FAILED 307 (Temporary Redirect)
(anony
I looked a bit more at the failing snippet it via the Chromium developer tools and it does seem like the openml.org/data
has a single Access-Control-Allow-Origin
header
while the www.openml.org/data
has two:
So maybe the redirection from openml.org/data
to www.openml.org
adds an unnecessary header. Random guess (complete newbie in this kind of thing), maybe somewhere there is a Header append
(or maybe add
) instead of Header set
?
Actually the issue does not seem related to redirection, only to https://www.openml.org/data
as can be seen from the snippet:
function reqListener() {
console.log(this.responseText);
}
req = new XMLHttpRequest();
req.addEventListener("load", reqListener);
req.open("GET", "https://www.openml.org/data/v1/download/16826755");
req.send();
XMLHttpRequest
seems more picky and complains about having twice the same Access-Control-Allow-Origin
header while typing the URL in a browser or downloading via wget is more forgiving ...
Thanks! I updated the configuration. Is it better now?
Hmmm still not quite, here is the status:
https://www.openml.org/data/v1/download/16826755
works: it has a single Access-Control-Allow-Origin: *
headerhttps://openml.org/data/v1/download/16826755
does not work, it has no Access-Control-Allow-Origin: *
header. Unfortunately, this is the one we are using in scikit-learn.How are you testing? I do see the header in Chromium
I go to scikit-learn.org, open a browser console and type the following snippet and then Enter:
function reqListener() {
console.log(this.responseText);
}
req = new XMLHttpRequest();
req.addEventListener("load", reqListener);
req.open("GET", "https://openml.org/data/v1/download/16826755");
req.send();
This is what I get:
Ok, I figured it out I think. Is it working on your end now?
Now I get a failure for both https://openml.org/data/v1/download/16826755 https://www.openml.org/data/v1/download/16826755 because they have two Access-Control-Allow-Origin: *
Access to XMLHttpRequest at 'https://openml.org/data/v1/download/16826755' from origin 'https://scikit-learn.org'
has been blocked by CORS policy: The 'Access-Control-Allow-Origin' header contains multiple values '*, *',
but only one is allowed.
Of course :) How about now?
Works fine now, thanks a lot for this!
Pwiew :) Thanks for the quick feedback. Feel free to close the issue.
Feel free to close the issue.
I would if I could, but I am not the one who opened it :wink:
Ah! No worries, I'll close it then. Good luck with the scikit-learn gallery examples!
Hello,
I would like to use OpenML in a JavaScript library, which should work in the browser and in nodejs. So, far i have no problems in fetching data using the REST API in node.js. When i want to fetch data in the browser, the browser blocks the request, because the response header does not contain a "Access-Control-Allow-Origin" header.
Is there a problem on my side, do i have to add something to the request header? Is it on purpose to permit fetching data from a script running in a browser from your side?
I am using node-fetch in the node environment, and the default javascript fetch in the browser environment.