openbudgets / pipeline-fragments

Reusable fragments of LinkedPipes ETL pipelines
2 stars 3 forks source link

FDPtoRDF update deployment request (to Fraunhofer server) #28

Closed marek-dudas closed 7 years ago

marek-dudas commented 7 years ago

For hopefully better communication, I am asking through github: @fathoni , @lavdim as you did the last pipeline update, may I ask you for another one? Both the FDPtoRDF pipeline and the LP-ETL component needs to be updated on the Fraunhofer server with the current versions on Github. Thanks in advance! Also, further testing once it is deployed is definitely welcome.

lavdim commented 7 years ago

(+1 for better communication). The LP-ETL component has been updated. But during the execution of the given pipeline, somewhere in the end of process, an error like below is thrown:

com.linkedpipes.etl.executor.api.v1.LpException: Execution failed. at com.linkedpipes.etl.executor.api.v1.component.SequentialWrap.execute(SequentialWrap.java:46) at com.linkedpipes.etl.executor.component.SequentialComponentExecutor.run(SequentialComponentExecutor.java:34) at java.lang.Thread.run(Thread.java:745) Caused by: com.linkedpipes.etl.executor.api.v1.LpException: Dataset IRI not found in metadata. at com.linkedpipes.etl.executor.api.v1.service.DefaultExceptionFactory.failure(DefaultExceptionFactory.java:18) at com.linkedpipes.plugin.transformer.fdp.FdpToRdf.extractDataset(FdpToRdf.java:110) at com.linkedpipes.plugin.transformer.fdp.FdpToRdf.execute(FdpToRdf.java:258) at com.linkedpipes.etl.executor.api.v1.component.SequentialWrap.execute(SequentialWrap.java:44) ... 2 common frames omitted 2017-06-16 09:57:09,981 [asynchExecutor-1] ERROR c.l.e.e.e.Execution - Component execution failed : http://localhost:8181/resources/pipelines/created-1497599817206/52 com.linkedpipes.etl.executor.ExecutorException: Component execution failed. at com.linkedpipes.etl.executor.component.SequentialComponentExecutor.run(SequentialComponentExecutor.java:38) at java.lang.Thread.run(Thread.java:745) Caused by: com.linkedpipes.etl.executor.api.v1.LpException: Execution failed. at com.linkedpipes.etl.executor.api.v1.component.SequentialWrap.execute(SequentialWrap.java:46) at com.linkedpipes.etl.executor.component.SequentialComponentExecutor.run(SequentialComponentExecutor.java:34) ... 1 common frames omitted Caused by: com.linkedpipes.etl.executor.api.v1.LpException: Dataset IRI not found in metadata. at com.linkedpipes.etl.executor.api.v1.service.DefaultExceptionFactory.failure(DefaultExceptionFactory.java:18) at com.linkedpipes.plugin.transformer.fdp.FdpToRdf.extractDataset(FdpToRdf.java:110) at com.linkedpipes.plugin.transformer.fdp.FdpToRdf.execute(FdpToRdf.java:258) at com.linkedpipes.etl.executor.api.v1.component.SequentialWrap.execute(SequentialWrap.java:44) ... 2 common frames omitted

Could you please have a look?

P.s. I will be on holidays in following weeks, please assign new issues to @liyakun and/or @pierorex .

marek-dudas commented 7 years ago

Thanks. Did you trigger the pipeline through os-packager, or using the LP-ETL web interface directly? Because in the latter case the pipeline gets no input. Some sample .csv files for os-packager like this can be found in subdirectories of FDPtoRDF github test folder. I've triggered the pipeline a few times myself, but I can't connect to LP on Fraunhofer server and check logs now since I am out of office.

Anyway, I will look into it next week and contact @liyakun or @pierorex if necessary.

lavdim commented 7 years ago

Yes, I tried directly through LP-ETL web interface.

skarampatakis commented 7 years ago

What is the status of this? I tried the hook today and it seems to be running the old pipeline.

marek-dudas commented 7 years ago

You are right. I suppose there is some URL redirection/rewrite setup on the Fraunhofer server that has to be updated to point to the new version of the pipeline. I.e. change the pipeline id at the end of the url from the current (probably) ...created-1488446848419 to ...created-1497599817206. @liyakun or @pierorex , could you please look into it?

marek-dudas commented 7 years ago

Also, it seems like there is still the old version of t-fdpToRdf component in LP-ETL. I have just made a new commit anyway, so @liyakun or @pierorex, please replace t-fdpToRdf.jar in deploy/jars/opendata/ (or it might be elsewhere in deploy/jars, I don't have access there) in the LP-ETL folder on the Fraunhofer server with its current version from github and restart LP-ETL.

liyakun commented 7 years ago

@marek-dudas Sorry for the delay in replying. I have changed the redirection and update the jar file. Could you check whether the update is successful?

skarampatakis commented 7 years ago

@liyakun I have re run the hook for this http://eis-openbudgets.iais.fraunhofer.de/dumps/fromfdp/europe-greece-municipality-thessaloniki-2016-revenue.nt

dataset but it seems it never updates. Currently the only way to see if the hook has finished is to reload the page and watch the timestamp change...

Could you please check if there are any errors?

jindrichmynarz commented 7 years ago

It seems the FDP2RDF pipeline provides too little feedback. Isn't it time to revisit openbudgets/platform#25?

liyakun commented 7 years ago

@skarampatakis I don't exactly where should I look for the error. Could you provide some hints?

skarampatakis commented 7 years ago

Probably on the executions tab, on LP.

On which lp instance does the pipeline run on the server? I could also have a look now that I can access again the server.

Στις 26 Ιουν 2017 12:27, ο χρήστης "Yakun Li" notifications@github.com έγραψε:

@skarampatakis https://github.com/skarampatakis I don't exactly where should I look for the error. Could you provide some hints?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openbudgets/pipeline-fragments/issues/28#issuecomment-311008715, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTsbBc94gQ3eN9i2dXfupX7QNruiq6Aks5sH3lsgaJpZM4N7ccb .

liyakun commented 7 years ago

@skarampatakis The instance running on port 8181 is used by FDP-to-RDF-Pipeline. But I am afraid that you need sudo to check it.

skarampatakis commented 7 years ago

So actually the pipeline fails on FDP to RDF component.

Caused by: com.linkedpipes.etl.executor.api.v1.LpException: Dataset IRI not found in metadata.
    at com.linkedpipes.etl.executor.api.v1.service.DefaultExceptionFactory.failure(DefaultExceptionFactory.java:18)
    at com.linkedpipes.plugin.transformer.fdp.FdpToRdf.extractDataset(FdpToRdf.java:110)
    at com.linkedpipes.plugin.transformer.fdp.FdpToRdf.execute(FdpToRdf.java:258)
    at com.linkedpipes.etl.executor.api.v1.component.SequentialWrap.execute(SequentialWrap.java:44)
    ... 2 common frames omitted

@marek-dudas can you have a look? I have no idea on what is happening there.

marek-dudas commented 7 years ago

If there was the latest version of t-fdpToRdf.jar, we should see a version info in the error message, just before the Dataset IRI not found. @liyakun, have you restarted LP-ETL? Could you please also check the deploy/jars folder of LP-ETL for any duplicit older version of t-fdpToRdf.jar?

@jindrichmynarz I think that there is a plan to have a (possibly slightly altered) version of os-admin running on Fraunhofer server, where some feedback will be implemented. It was mentioned during the last tech-call I think.

skarampatakis commented 7 years ago

What are all these graphs for a single dataset? image

Also, I just noticed that along with the pipeline, goes admin credentials on the virtuso server. I think this could pose a security risk. Couldn't we just create a CRUD user for the triplestore?

marek-dudas commented 7 years ago

I have no idea. Is it a list of graphs in Virtuoso? Do you suggest FDPtoRDF pipeline created them? In any case, I suggest creating separate issues for both those mystery graphs and the credentials problem.

skarampatakis commented 7 years ago

It seems to be present for some datasets created by FDP2RDF pipeline. Maybe this was introduced at some point as it seems to be valid only for some datasets, specifically for the most recent ones.

I will create separate issues for all these. In the meantime please change passwords on obeu triplestore and on http://obeu.vse.cz:8890. And please when you upload pipelines, upload them without the credentials.

liyakun commented 7 years ago

@marek-dudas you are right, I removed the old version of jar, now the newest should be used.

marek-dudas commented 7 years ago

It still seems LinkedPipes somehow work with the old jar. Also, the "run external hooks" in os-admin now does not trigger any pipeline (at least it seems like so to me).

I tried running pipelines manually with curl and the updated one fails as if there were still the old t-fdpToRdf.jar, while older pipelines built against the old .jar work.

liyakun commented 7 years ago

@marek-dudas , this is the updated jar -rwxr-xr-x 1 root root 33.6K Jun 23 15:13 t-fdpToRdf.jar on the container. I think it is already the newest if you did not change it again after Jun-23-2017.

jindrichmynarz commented 7 years ago

Did you also restart LP-ETL? You can check if the loaded JAR-file is the most recent one by diffing it with the one in the Git repository: diff fdpToRdf.jar t-fdpToRdf.jar

marek-dudas commented 7 years ago

Also, is the "33.6K" the actual size of the .jar? That would be strange because it should be about 131 kB. Would it be possible for me to look at the files directly on the Fraunhofer server, or is it not allowed by the security policy? Also, I have no idea about the configuration anyway - where the LinkedPipes is installed etc. So I don't know if I can be of any help.

liyakun commented 7 years ago

@marek-dudas from the ls -lah on the container, it is the size of the .jar file which I get from github through wget. I already sent one email to Fabrizio for the directly accessing file on Fraunhofer server and you are in the cc.

pwalsh commented 7 years ago

Hi. Surely, surely, there are better ways to know what code is running on the server than diffing some jar files?

jindrichmynarz commented 7 years ago

How about creating versioned releases for the FDP to RDF's pipeline repository? Alternatively, the component's source code can checked out on the OBEU server and compiled directly there, so that it's clear what revision it is based on.

marek-dudas commented 7 years ago

You mean the https://github.com/openbudgets/pipeline-fragments repository where the pipeline is or the https://github.com/opendatacz/lp-etl-components repository where the corresponding LP-ETL component is? Maybe we could create a separate repository for releases of the pipeline where we would publish the component and the pipeline together?

I think in the end the final operation would stay the same: you have to replace a .jar file inside the LP-ETL and restart it. I see no easy workaround. But I don't know how version releases work in GitHub, maybe there is some magic?

Also, we know that the Fraunhofer server still runs with the old version of the component. The issue is finding out why and fixing it.

jindrichmynarz commented 7 years ago

I fixed the link to the repository. I meant https://github.com/opendatacz/lp-etl-components. I don't think creating a separate repository is required.

Releases typically contain their version in their file names, so that the version can be told from the file (unless someone renames it). Naming the files is not something GitHub will do for you. You will need to do that (Maven offers some help). There's no magic.

To avoid issues like these perhaps a Dockerfile for the FDP to RDF pipeline can be created and included in the Docker Compose file for the OpenBudgets.eu platform. This way, things like restarting LP-ETL can be automated.

skarampatakis commented 7 years ago

Can you update the status of this issue?

liyakun commented 7 years ago

@skarampatakis , I gave access to @marek-dudas to check the docker image, @marek-dudas did you already try?

marek-dudas commented 7 years ago

I connected directly to the running docker container. There was t-fdpToRdf.jar from May 31, I replaced it with the new one and restarted LP executor. The pipeline now uses the correct version of t-fdpToRdf.

There are about five docker containers with LP on the server, so maybe there is some confusion? Also, I am new to Docker, I made the change only inside the running container, I don't know how is it with persistence.

Anyway, it seems that running "external hooks" from os-admin now does not execute anything. Is there any documentation related to the redirection and also to LP-ETL & docker configuration in general?

To sum up this:

liyakun commented 7 years ago

@marek-dudas I shared with you one documentation file about setting of LP, you should receive the link from your email marek.dudas@vse.cz.

The LP instance running on port 8181 is used by FDP-to-RDF-Pipeline. There is the following setting in the docker file ENV URL_FDP2RDF_COMPONENT https://github.com/opendatacz/lp-etl-components/raw/master/t-fdpToRdf/deploy/t-fdpToRdf.jar RUN cd /etl/deploy/jars && \ wget $URL_FDP2RDF_COMPONENT which should get the latest version of the jar file.

You are using the os-admin from open-spending, right?

marek-dudas commented 7 years ago

Thanks @liyakun . Yes, I use this os-admin: https://next.openspending.org/admin/

The config seems fine, as far as I can tell, provided the hostname linkedpipes_fdp in the proxy config is correct.

Maybe "pulling and restarting the OBEU-stack on the server" (as said in the doc) should be tried to be sure the nginx configuration is applied?

liyakun commented 7 years ago

@marek-dudas welcome, the Nginx server container always restarts, so It always gets the latest update every time the OBEU-stack updates.

marek-dudas commented 7 years ago

I checked again and I think the execution from os-admin stopped working because of the recently configured 301 redirection from http://eis-openbudgets.iais.fraunhofer.de to http://apps.openbudgets.eu. For example, CURL does not follow such redirects by default, and it is probably the same with whatever sends the http request in os-admin.

So, @akariv or @pwalsh, would it be possible to change the FDPtoRDF pipeline execution URL in os-admin from http://eis-openbudgets.iais.fraunhofer.de/linkedpipes/execute/fdp2rdf to http://apps.openbudgets.eu/linkedpipes/execute/fdp2rdf? (Or change the code to follow redirects, but I assume changing the url is easier.)

pwalsh commented 7 years ago

@marek-dudas yes, we can change it, but, also, it means there is a "bug" if the redirects are not followed, IMHO, probably in the nginx configuration for rewriting requests at Fraunhofer.

marek-dudas commented 7 years ago

The Fraunhofer server sends HTTP response 301 -- I think it is up to the client if it follows the redirection or not. Maybe some rewriting/proxying instead of the http redirection would be better as you say, but I am not an expert in that and don't make these decisions. It might happen that some other app from OBEU stack will run into the same problem and in that case the server redirection config should be reconsidered. Otherwise, I would just change the FDPtoRDF url in os-admin. But again, I am just reporting the issue, the configuration is not up to me.

liyakun commented 7 years ago

@marek-dudas Thanks for your explanation. Instead of redirect from http://eis-openbudgets.iais.fraunhofer.de/ to http://apps.openbudgets.eu/, I changed it to rewrite, hope this will solve the issue.

marek-dudas commented 7 years ago

Does it take some time before the config is applied? I am still getting 301 redirection when requesting http://eis-openbudgets.iais.fraunhofer.de/linkedpipes/execute/fdp2rdf

liyakun commented 7 years ago

@marek-dudas have you tried to clean your browser cache?

marek-dudas commented 7 years ago

I am testing with CURL, AFAIK it has no cache. Also, the server response is stamped with 15:48 GMT.

liyakun commented 7 years ago

okay, right. The server uses the following conf server { listen 80; server_name eis-openbudgets.iais.fraunhofer.de; rewrite ^ $scheme://apps.openbudgets.eu$request_uri last; return 403; } , do you see any problem with the setting? thanks.

marek-dudas commented 7 years ago

I can be of little help here, I have never used nginx. What I can tell after some googling is that the rewrite will result into HTTP 30x redirection when the target URL starts with http:// (or is treated as an external URL). It would probably have to be configured as a proxy to work without the redirection. But then the URL in the browser stays at fraunhofer and does not change to apps.openbudgets.eu - which might be an undesired consequence for user-experience for other apps. But maybe it wouldn't matter. Again, I am not an expert in this and don't have the knowledge to see the appropriate solution. @pwalsh , any advice here? I would just change the URL in os-admin, if this issue is specific for the FDPtoRDF pipeline, and rather not mess with the global server config.

pwalsh commented 7 years ago

@marek-dudas and all: what I was saying is that our code follows redirects, so there is a bug in the server config. Yes, of course we can update the URL we POST too, but also, as you say later on Marek, this problem may well arise elsewhere, so it is also a good idea to get the proxy server configured correctly - redirects, rewrites, whatever.

marek-dudas commented 7 years ago

@pwalsh then I got it wrong, thanks for the clarification. And it seems you are right. I tested with curl with "follow redirects" on:

It can take a second or more (in some cases) for LinkedPipes to respond when data is sent to it, so the solution might be setting the proxy timeout to some higher value, but the problem also might be elsewhere. @liyakun: We could start with increasing the proxy timeout to something like 300s or so and see where it goes. We can also try the solution with clearing the Connection header I linked above. Or maybe someone with more experience than me will have a better advice?

This setting of timeout will however be specific for the FDP LinkedPipes proxy, and the issue might be specific to how LinkedPipes response, so I would still consider the simple change of URL in os-admin to be a better option. I think that debugging this weird gateway timeout can take a long time which we do not have at the moment. Or maybe we could change the URL in os-admin now (to be able to use the pipeline) and continue debugging the timeout meanwhile.

akariv commented 7 years ago

Either way I changed the hooks endpoint to point to http://apps.openbudgets.eu/linkedpipes/execute/fdp2rdf

marek-dudas commented 7 years ago

And it works! Thanks everyone.