Closed marek-dudas closed 7 years ago
(+1 for better communication). The LP-ETL component has been updated. But during the execution of the given pipeline, somewhere in the end of process, an error like below is thrown:
com.linkedpipes.etl.executor.api.v1.LpException: Execution failed. at com.linkedpipes.etl.executor.api.v1.component.SequentialWrap.execute(SequentialWrap.java:46) at com.linkedpipes.etl.executor.component.SequentialComponentExecutor.run(SequentialComponentExecutor.java:34) at java.lang.Thread.run(Thread.java:745) Caused by: com.linkedpipes.etl.executor.api.v1.LpException: Dataset IRI not found in metadata. at com.linkedpipes.etl.executor.api.v1.service.DefaultExceptionFactory.failure(DefaultExceptionFactory.java:18) at com.linkedpipes.plugin.transformer.fdp.FdpToRdf.extractDataset(FdpToRdf.java:110) at com.linkedpipes.plugin.transformer.fdp.FdpToRdf.execute(FdpToRdf.java:258) at com.linkedpipes.etl.executor.api.v1.component.SequentialWrap.execute(SequentialWrap.java:44) ... 2 common frames omitted 2017-06-16 09:57:09,981 [asynchExecutor-1] ERROR c.l.e.e.e.Execution - Component execution failed : http://localhost:8181/resources/pipelines/created-1497599817206/52 com.linkedpipes.etl.executor.ExecutorException: Component execution failed. at com.linkedpipes.etl.executor.component.SequentialComponentExecutor.run(SequentialComponentExecutor.java:38) at java.lang.Thread.run(Thread.java:745) Caused by: com.linkedpipes.etl.executor.api.v1.LpException: Execution failed. at com.linkedpipes.etl.executor.api.v1.component.SequentialWrap.execute(SequentialWrap.java:46) at com.linkedpipes.etl.executor.component.SequentialComponentExecutor.run(SequentialComponentExecutor.java:34) ... 1 common frames omitted Caused by: com.linkedpipes.etl.executor.api.v1.LpException: Dataset IRI not found in metadata. at com.linkedpipes.etl.executor.api.v1.service.DefaultExceptionFactory.failure(DefaultExceptionFactory.java:18) at com.linkedpipes.plugin.transformer.fdp.FdpToRdf.extractDataset(FdpToRdf.java:110) at com.linkedpipes.plugin.transformer.fdp.FdpToRdf.execute(FdpToRdf.java:258) at com.linkedpipes.etl.executor.api.v1.component.SequentialWrap.execute(SequentialWrap.java:44) ... 2 common frames omitted
Could you please have a look?
P.s. I will be on holidays in following weeks, please assign new issues to @liyakun and/or @pierorex .
Thanks. Did you trigger the pipeline through os-packager, or using the LP-ETL web interface directly? Because in the latter case the pipeline gets no input. Some sample .csv files for os-packager like this can be found in subdirectories of FDPtoRDF github test folder. I've triggered the pipeline a few times myself, but I can't connect to LP on Fraunhofer server and check logs now since I am out of office.
Anyway, I will look into it next week and contact @liyakun or @pierorex if necessary.
Yes, I tried directly through LP-ETL web interface.
What is the status of this? I tried the hook today and it seems to be running the old pipeline.
You are right. I suppose there is some URL redirection/rewrite setup on the Fraunhofer server that has to be updated to point to the new version of the pipeline. I.e. change the pipeline id at the end of the url from the current (probably) ...created-1488446848419
to ...created-1497599817206
. @liyakun or @pierorex , could you please look into it?
Also, it seems like there is still the old version of t-fdpToRdf component in LP-ETL. I have just made a new commit anyway, so @liyakun or @pierorex, please replace t-fdpToRdf.jar
in deploy/jars/opendata/ (or it might be elsewhere in deploy/jars, I don't have access there) in the LP-ETL folder on the Fraunhofer server with its current version from github and restart LP-ETL.
@marek-dudas Sorry for the delay in replying. I have changed the redirection and update the jar file. Could you check whether the update is successful?
@liyakun I have re run the hook for this http://eis-openbudgets.iais.fraunhofer.de/dumps/fromfdp/europe-greece-municipality-thessaloniki-2016-revenue.nt
dataset but it seems it never updates. Currently the only way to see if the hook has finished is to reload the page and watch the timestamp change...
Could you please check if there are any errors?
It seems the FDP2RDF pipeline provides too little feedback. Isn't it time to revisit openbudgets/platform#25?
@skarampatakis I don't exactly where should I look for the error. Could you provide some hints?
Probably on the executions tab, on LP.
On which lp instance does the pipeline run on the server? I could also have a look now that I can access again the server.
Στις 26 Ιουν 2017 12:27, ο χρήστης "Yakun Li" notifications@github.com έγραψε:
@skarampatakis https://github.com/skarampatakis I don't exactly where should I look for the error. Could you provide some hints?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openbudgets/pipeline-fragments/issues/28#issuecomment-311008715, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTsbBc94gQ3eN9i2dXfupX7QNruiq6Aks5sH3lsgaJpZM4N7ccb .
@skarampatakis The instance running on port 8181 is used by FDP-to-RDF-Pipeline. But I am afraid that you need sudo to check it.
So actually the pipeline fails on FDP to RDF component.
Caused by: com.linkedpipes.etl.executor.api.v1.LpException: Dataset IRI not found in metadata.
at com.linkedpipes.etl.executor.api.v1.service.DefaultExceptionFactory.failure(DefaultExceptionFactory.java:18)
at com.linkedpipes.plugin.transformer.fdp.FdpToRdf.extractDataset(FdpToRdf.java:110)
at com.linkedpipes.plugin.transformer.fdp.FdpToRdf.execute(FdpToRdf.java:258)
at com.linkedpipes.etl.executor.api.v1.component.SequentialWrap.execute(SequentialWrap.java:44)
... 2 common frames omitted
@marek-dudas can you have a look? I have no idea on what is happening there.
If there was the latest version of t-fdpToRdf.jar, we should see a version info in the error message, just before the Dataset IRI not found
. @liyakun, have you restarted LP-ETL? Could you please also check the deploy/jars folder of LP-ETL for any duplicit older version of t-fdpToRdf.jar?
@jindrichmynarz I think that there is a plan to have a (possibly slightly altered) version of os-admin running on Fraunhofer server, where some feedback will be implemented. It was mentioned during the last tech-call I think.
What are all these graphs for a single dataset?
Also, I just noticed that along with the pipeline, goes admin credentials on the virtuso server. I think this could pose a security risk. Couldn't we just create a CRUD user for the triplestore?
I have no idea. Is it a list of graphs in Virtuoso? Do you suggest FDPtoRDF pipeline created them? In any case, I suggest creating separate issues for both those mystery graphs and the credentials problem.
It seems to be present for some datasets created by FDP2RDF pipeline. Maybe this was introduced at some point as it seems to be valid only for some datasets, specifically for the most recent ones.
I will create separate issues for all these. In the meantime please change passwords on obeu triplestore and on http://obeu.vse.cz:8890. And please when you upload pipelines, upload them without the credentials.
@marek-dudas you are right, I removed the old version of jar, now the newest should be used.
It still seems LinkedPipes somehow work with the old jar. Also, the "run external hooks" in os-admin now does not trigger any pipeline (at least it seems like so to me).
I tried running pipelines manually with curl and the updated one fails as if there were still the old t-fdpToRdf.jar, while older pipelines built against the old .jar work.
@marek-dudas , this is the updated jar -rwxr-xr-x 1 root root 33.6K Jun 23 15:13 t-fdpToRdf.jar
on the container. I think it is already the newest if you did not change it again after Jun-23-2017.
Did you also restart LP-ETL? You can check if the loaded JAR-file is the most recent one by diffing it with the one in the Git repository: diff fdpToRdf.jar t-fdpToRdf.jar
Also, is the "33.6K" the actual size of the .jar? That would be strange because it should be about 131 kB. Would it be possible for me to look at the files directly on the Fraunhofer server, or is it not allowed by the security policy? Also, I have no idea about the configuration anyway - where the LinkedPipes is installed etc. So I don't know if I can be of any help.
@marek-dudas from the ls -lah
on the container, it is the size of the .jar file which I get from github through wget. I already sent one email to Fabrizio for the directly accessing file on Fraunhofer server and you are in the cc.
Hi. Surely, surely, there are better ways to know what code is running on the server than diffing some jar files?
How about creating versioned releases for the FDP to RDF's pipeline repository? Alternatively, the component's source code can checked out on the OBEU server and compiled directly there, so that it's clear what revision it is based on.
You mean the https://github.com/openbudgets/pipeline-fragments repository where the pipeline is or the https://github.com/opendatacz/lp-etl-components repository where the corresponding LP-ETL component is? Maybe we could create a separate repository for releases of the pipeline where we would publish the component and the pipeline together?
I think in the end the final operation would stay the same: you have to replace a .jar file inside the LP-ETL and restart it. I see no easy workaround. But I don't know how version releases work in GitHub, maybe there is some magic?
Also, we know that the Fraunhofer server still runs with the old version of the component. The issue is finding out why and fixing it.
I fixed the link to the repository. I meant https://github.com/opendatacz/lp-etl-components. I don't think creating a separate repository is required.
Releases typically contain their version in their file names, so that the version can be told from the file (unless someone renames it). Naming the files is not something GitHub will do for you. You will need to do that (Maven offers some help). There's no magic.
To avoid issues like these perhaps a Dockerfile for the FDP to RDF pipeline can be created and included in the Docker Compose file for the OpenBudgets.eu platform. This way, things like restarting LP-ETL can be automated.
Can you update the status of this issue?
@skarampatakis , I gave access to @marek-dudas to check the docker image, @marek-dudas did you already try?
I connected directly to the running docker container. There was t-fdpToRdf.jar from May 31, I replaced it with the new one and restarted LP executor. The pipeline now uses the correct version of t-fdpToRdf.
There are about five docker containers with LP on the server, so maybe there is some confusion? Also, I am new to Docker, I made the change only inside the running container, I don't know how is it with persistence.
Anyway, it seems that running "external hooks" from os-admin now does not execute anything. Is there any documentation related to the redirection and also to LP-ETL & docker configuration in general?
To sum up this:
@marek-dudas I shared with you one documentation file about setting of LP, you should receive the link from your email marek.dudas@vse.cz
.
The LP instance running on port 8181 is used by FDP-to-RDF-Pipeline. There is the following setting in the docker file
ENV URL_FDP2RDF_COMPONENT https://github.com/opendatacz/lp-etl-components/raw/master/t-fdpToRdf/deploy/t-fdpToRdf.jar
RUN cd /etl/deploy/jars && \
wget $URL_FDP2RDF_COMPONENT
which should get the latest version of the jar file.
You are using the os-admin
from open-spending, right?
Thanks @liyakun . Yes, I use this os-admin: https://next.openspending.org/admin/
The config seems fine, as far as I can tell, provided the hostname linkedpipes_fdp
in the proxy config is correct.
Maybe "pulling and restarting the OBEU-stack on the server" (as said in the doc) should be tried to be sure the nginx configuration is applied?
@marek-dudas welcome, the Nginx server container always restarts, so It always gets the latest update every time the OBEU-stack updates.
I checked again and I think the execution from os-admin stopped working because of the recently configured 301 redirection from http://eis-openbudgets.iais.fraunhofer.de to http://apps.openbudgets.eu. For example, CURL does not follow such redirects by default, and it is probably the same with whatever sends the http request in os-admin.
So, @akariv or @pwalsh, would it be possible to change the FDPtoRDF pipeline execution URL in os-admin from http://eis-openbudgets.iais.fraunhofer.de/linkedpipes/execute/fdp2rdf to http://apps.openbudgets.eu/linkedpipes/execute/fdp2rdf? (Or change the code to follow redirects, but I assume changing the url is easier.)
@marek-dudas yes, we can change it, but, also, it means there is a "bug" if the redirects are not followed, IMHO, probably in the nginx configuration for rewriting requests at Fraunhofer.
The Fraunhofer server sends HTTP response 301 -- I think it is up to the client if it follows the redirection or not. Maybe some rewriting/proxying instead of the http redirection would be better as you say, but I am not an expert in that and don't make these decisions. It might happen that some other app from OBEU stack will run into the same problem and in that case the server redirection config should be reconsidered. Otherwise, I would just change the FDPtoRDF url in os-admin. But again, I am just reporting the issue, the configuration is not up to me.
@marek-dudas Thanks for your explanation. Instead of redirect from http://eis-openbudgets.iais.fraunhofer.de/
to http://apps.openbudgets.eu/
, I changed it to rewrite, hope this will solve the issue.
Does it take some time before the config is applied? I am still getting 301 redirection when requesting http://eis-openbudgets.iais.fraunhofer.de/linkedpipes/execute/fdp2rdf
@marek-dudas have you tried to clean your browser cache?
I am testing with CURL, AFAIK it has no cache. Also, the server response is stamped with 15:48 GMT.
okay, right. The server uses the following conf
server { listen 80; server_name eis-openbudgets.iais.fraunhofer.de; rewrite ^ $scheme://apps.openbudgets.eu$request_uri last; return 403; }
, do you see any problem with the setting? thanks.
I can be of little help here, I have never used nginx. What I can tell after some googling is that the rewrite will result into HTTP 30x redirection when the target URL starts with http:// (or is treated as an external URL). It would probably have to be configured as a proxy to work without the redirection. But then the URL in the browser stays at fraunhofer and does not change to apps.openbudgets.eu - which might be an undesired consequence for user-experience for other apps. But maybe it wouldn't matter. Again, I am not an expert in this and don't have the knowledge to see the appropriate solution. @pwalsh , any advice here? I would just change the URL in os-admin, if this issue is specific for the FDPtoRDF pipeline, and rather not mess with the global server config.
@marek-dudas and all: what I was saying is that our code follows redirects, so there is a bug in the server config. Yes, of course we can update the URL we POST too, but also, as you say later on Marek, this problem may well arise elsewhere, so it is also a good idea to get the proxy server configured correctly - redirects, rewrites, whatever.
@pwalsh then I got it wrong, thanks for the clarification. And it seems you are right. I tested with curl with "follow redirects" on:
It can take a second or more (in some cases) for LinkedPipes to respond when data is sent to it, so the solution might be setting the proxy timeout to some higher value, but the problem also might be elsewhere. @liyakun: We could start with increasing the proxy timeout to something like 300s or so and see where it goes. We can also try the solution with clearing the Connection header I linked above. Or maybe someone with more experience than me will have a better advice?
This setting of timeout will however be specific for the FDP LinkedPipes proxy, and the issue might be specific to how LinkedPipes response, so I would still consider the simple change of URL in os-admin to be a better option. I think that debugging this weird gateway timeout can take a long time which we do not have at the moment. Or maybe we could change the URL in os-admin now (to be able to use the pipeline) and continue debugging the timeout meanwhile.
Either way I changed the hooks endpoint to point to http://apps.openbudgets.eu/linkedpipes/execute/fdp2rdf
And it works! Thanks everyone.
For hopefully better communication, I am asking through github: @fathoni , @lavdim as you did the last pipeline update, may I ask you for another one? Both the FDPtoRDF pipeline and the LP-ETL component needs to be updated on the Fraunhofer server with the current versions on Github. Thanks in advance! Also, further testing once it is deployed is definitely welcome.