usc-isi-i2 / t2wml

Table to Wikidata Mapping Language
MIT License
22 stars 11 forks source link

Upload FIDIL JSON to FIDIL endpoint #576

Open devowit opened 3 years ago

devowit commented 3 years ago

The Fidil upload endpoint should be an environmental variable:

FIDIL_UPLOAD_ENDPOINT="hmi-server:8080:/fidil/structured/datasets/upload"

g1eb commented 3 years ago

We could add this to the upload file menu in the project settings, see #479

kyao commented 3 years ago

The FIDIL upload is POST "hmi-server:8080:/fidil/structured/datasets/upload" with json body.

To make this more general, how about defining an environment variable HMI_SEVER_HOST_PORT.

HMI_SERVER_HOST_PORT=hmi-server:8080

If HMI_SERVER_HOST_PORT is not defined, then we default to hmi-server:8080

kyao commented 3 years ago

We could add this to the upload file menu in the project settings, see #479

Do you mean adding an Upload menu item to the button with three dots at the top-right of the screen? Seems good to me.

devowit commented 3 years ago

@kyao I've added this in the backend, but how do we test it?

kyao commented 3 years ago

@g1eb Please add the upload load menu item. Then I will test it over there at CausX by starting up an instance.

g1eb commented 3 years ago

I was just about to add the same comment hah, adding that button right now

g1eb commented 3 years ago

@kyao for some reason this was not tagged with a milestone, I just added M2 because it seems related to the other tickets generating and downloading the FIDIL files.

g1eb commented 3 years ago

I've added a button to upload the FIDIL files but the endpoint is still missing, i'm getting a 404 for the GET request to /api/causx/project/upload/fidil_json/

g1eb commented 3 years ago

We got the button and the connection all set up, just need to figure out the ENV variable part and test with Ke-Thia being able to rebuild docker images over on the causx cluster

g1eb commented 3 years ago

@kyao have you had a chance to to rebuild the latest t2wml-web version on causx gitlab with that environment variable set?

kyao commented 3 years ago

@devowit @g1eb When I select FIDIL download I am getting a 404 error for

GET http://localhost:8080/api/causx/project/fidil_json?data_file=World_Education_Index_truncated.xlsx&sheet_name=Education%20Index&mapping_file=web.annotation&mapping_type=Annotation

Am I using the wrong t2wml-api version? I am using commit 986ffb2c2d7ccc7e30dadf0345cfd15eaab069c4.

kyao commented 3 years ago

And, for t2wml I am using commit 45e423959475e0ae17fb7911ebf8aab5291aa5db

kyao commented 3 years ago

When I try uploading the FIDIL file on a causx instance, I am getting invalid json error. I think t2wml is trying upload the HTML 404 error message, instead of the actual JSON file.

kyao commented 3 years ago

I exec into the container, and I do see the endpoint:

root@67c44c5aff84:/src# grep /api/causx/project/fidil *.py
grep /api/causx/project/fidil *.py
causx_application.py:@app.route('/api/causx/project/fidil_json/<filename>', methods=['GET'])

Screenshot from 2021-08-03 22-13-46crop

kyao commented 3 years ago

@devowit @g1eb This has to the live version as well

kyao commented 3 years ago

Looks like the endpoint is expecting a filename in the path?

kyao commented 3 years ago

@devowit @g1eb Also, got a request from the CX folks. They want the entire endpoint to be a configurable, something like:

FIDIL_UPLOAD_ENDPOINT="hmi-server:8080:/fidil/structured/datasets/upload"

devowit commented 3 years ago

the endpoint you're looking at is for downloading the fidil json, not uploading it. You want PUT /api/causx/project/upload_fidil_json/

it looks like you don't have the most up to date version, which is on causx branch of t2wml. @g1eb ?

g1eb commented 3 years ago

That's because Ke-Thia is using an older version of t2wml and t2wml-api as far as I can see from the git commit hashes he posted above.

@kyao could you update the hash to be the latest in both repos? It would be the development branch for t2wml-api and the causx branch for t2wml.

kyao commented 3 years ago

@g1eb Those hashes should be the latest, t2wml-api in develoment branch and t2wml in causx branch.

devowit commented 3 years ago

@kyao if they want the whole endpoint to be configurable, that means we don't need HMI_SERVER_HOST_PORT anymore, right? just FIDIL_UPLOAD_ENDPOINT?

g1eb commented 3 years ago

hmm the hash specified in the Dockerfile-backend (why do we have 2 of those by the way?) is set to this:

RUN git clone https://github.com/usc-isi-i2/t2wml.git && cd t2wml && git checkout 69199397cb301c48653e3abd40ee70def44fb155
devowit commented 3 years ago

@g1eb once we're fixing the hashes may as well update to the version with abhinav's copy annotation changes and my change to the environment variable that I just pushed.

no clue why we have two docker files. where are you seeing that? I only see one...

g1eb commented 3 years ago

There is a Dockerfile in the t2wml-backend repo without the set hash and one in the t2wml-web repo called Dockerfile-backend. I'm not sure which one is being used for deployments though, I thought it was one with the older hash.

Edit to add the links to those Dockerfiles: https://github.com/usc-isi-i2/t2wml/blob/causx/backend/Dockerfile https://github.com/usc-isi-i2/t2wml-web/blob/develop/Dockerfile-backend

g1eb commented 3 years ago

We discussed this issue with Ke-Thia today and here is the summary of that meeting:

  1. Generated FIDIL file is empty for the world education dataset, we need to look into that - probably a separate ticket though.
  2. Connection issue could be caused due to double colon in the url provided/copy-pasted from CausX - needs more testing
devowit commented 3 years ago

re problem 1: i have tested this on my end and not been able to replicate it. @kyao when generating for world education, did you also annotate a property? if you didn't annotate a property, no statements would have been generated and getting an empty result is normal.

kyao commented 3 years ago

@devowit After the meeting with @g1eb I tried again, and this time FIDIL json had content. Not sure what happened previously.

Here is what is happening when I try running an instance at CX.

image (6)

The first error is when I tried to download the FIDIL JSON. The download was block because it was not served over HTTPS. Not sure why this particular request to the backend endpoint was blocked, but not other accesses to the endpoints were not blocked.

The second error is when I tried to upload the FIDIL JSON. I am guessing our code is expecting a JSON and got back HTML instead? This could a problem on the CX side. But, we need better error check on our side.

I have asked them on how get access to the logs.

kyao commented 3 years ago

@g1eb @devowit Here is the log file:

tw2ml-logs.txt

kyao commented 3 years ago

image

kyao commented 3 years ago

Looks like we need to add http://

g1eb commented 3 years ago

Issue with downloading the FIDIL file has been fixed, it was a trailing slash at the end of that url https://github.com/usc-isi-i2/t2wml/commit/3a6c05e8fc390100130dbe6f0c0e9930ea24dd14

Upload FIDIL file connection with http:// prefix to the hmi-server endpoint needs to be tested on the causx cluster.