Closed rawsh closed 7 years ago
Hello,
Thank you for pointing this out to us. I think we can indeed allow CORS for all the Grobid web services by default, assuming that these are "read-only" services and the main usage would be public or internal pipelines not open to extranet.
If well informed people are reading this, are there in general safety issues with allowing CORS for all services that we should be aware of?
Commit 819b9b0424022859c1317da0e5e8d73d79707209 added CORS. @rawsh would it be possible for you to test if it is now working properly for cross domain Ajax request? Many thanks.
Thank you for the quick response!
@kermitt2 it looks like I'm still getting the same error. I git cloned the latest repo and that commit is in the history. Here is a screenshot, same js code:
@rawsh thanks! You don't need Access-Control-Allow-Origin: *
in your request (that comes in the server response) and for dataType - as the result is XML - xhr.setRequestHeader("dataType", "text");
(which should work better than xhr.setRequestHeader("dataType", "text/xml");
). You should also not used I think xhr.withCredentials(true)
because there is no authentication mechanism for the cross-site request.
Now I don't know if these changes will make the request working :D
I've tested on my machine and could send request from domain localhost:8000 to server running on localhost:8080 - normally different ports on the same domain are already considered cross domain.
@kermitt2 Looks like removing what you told me about brings something different; there is still the error with not passing the control check but now I actually get a 500 error from the server, when the same thing from curl works (I tried importing the curl command to postman and I get the same thing).
Curl command that works:
curl -v --form input=@/home/robert/Documents/Belmont/pdf-summarizer/examplepdfs/1.pdf localhost:8080/processHeaderDocument
JS that fails:
var data = new FormData();
data.append("input", "/home/robert/Documents/Belmont/pdf-summarizer/examplepdfs/1.pdf");
var xhr = new XMLHttpRequest();
xhr.addEventListener("readystatechange", function () {
if (this.readyState === 4) {
console.log(this.responseText);
}
});
xhr.open("POST", "http://localhost:8080/processHeaderDocument");
xhr.setRequestHeader("dataType", "text");
xhr.send(data);
Here is the output from grobid
Would you mind sending me the js code you used so I can test?
As for safety issues, CORS (at least using access: *) lets anybody post it, so someone could use the server without permission. I think being able to edit the access allow origin header and adding domains (or just turning CORS on/off) would be nice.
Ok so the good news is that Ajax CORS request is working!
There's a problem in the way you build your Ajax query, the PDF should be passed like this I think:
formData.append("input", "file:///home/robert/documents/Belmont/pdf-summerizer/examplespdfs/1.pdf");
But but but normally browsers will not allow request with something in the local file system (Chrome for sure will not allow it, Firefox maybe) - you could do it with node.js but I think otherwise it will be considered as a (major) security flaw. You can modify the settings but it's dangerous.
The javascript that I am using for testing this is the console javascript application in grobid-service, see grobid/grobid-service/src/main/webapp/grobid/grobid.js
in particular lines 594-623 which build the Ajax query for this service. As you can see, I am using directly the HTML form for building the FormData object, so no security problem.
Thank you for all the help! I finally got it working with
<form id="pdfform" onsubmit="return sendPost();">
<input type="file" name="input" accept="pdf">
<input type="submit">
</form>
<script>
function sendPost() {
var form = document.getElementById('pdfform');
var formData = new FormData(form);
var xhr = new XMLHttpRequest();
var url = "http://localhost:8080/processHeaderDocument";
xhr.responseType = 'text';
xhr.open('POST', url, true);
xhr.onreadystatechange = function(e) {
if (xhr.readyState == 4 && xhr.status == 200) {
console.log(e.target.response);
} else if (xhr.status != 200) {
console.log(xhr);
}
};
xhr.send(formData);
return false;
}
</script>
I think an example in the docs would be awesome so other devs can avoid my pain :sweat_smile:
Huh. Trying the processFulltextAssetDocument path gives me the XHR error when the other ones do not. I'm reading the data with a blob and jspdf, @kermitt2 do you know whats going on?
EDIT: looks like I just needed to add a .header("Access-Control-Allow-Origin", "*").header("Access-Control-Allow-Methods", "GET, POST, DELETE, PUT")
to the responseAsset function in the process file java code
Yes I didn't update this service because I plan to remove it later in September - it leads to heavy problems for some crazy PDF - for instance I had one PDF of around 10 pages with more than 40 000 embedded images - because it contains one embedded bitmap file for each line of a picture. This can be quite common for some publishers and it can make in practice the server down.
So I don't recommend you to use it. It will be replaced by another service working with crops rather than the embedded images, ensuring we don't have explosion of asset files.
@kermitt2 understood, thanks. Looking forward to the new image service.
Curl using
works, but trying the equivalent POST request through js (e.g.)
fails with
I've read that this can be fixed by adding a header 'Access-Control-Allow-Origin: clientside.com' to the server. Is there a config file or somewhere that I can add the url I want to allow to make POSTs?