openminted / Open-Call-Discussions

A central place for participants in the open calls to ask questions
2 stars 1 forks source link

OpenMinTeD SSH UC Hackathon #6

Closed reckart closed 6 years ago

reckart commented 6 years ago

I have deployed a component and tried to run it on the platform. The result of the operation is listed as "FAILED", but I have no idea why. How can one get access to the log output?

2018-04-08_14-37-41

Instance: test.openminted.eu

galanisd commented 6 years ago

I have deployed a component and tried to run it on the platform.

For an application you can directly run it after you registration. If it is a component this is not possible.

reckart commented 6 years ago

For an application you can directly run it after you registration. If it is a component this is not possible.

I know. I have built a workflow which makes use of the component that I had deployed (cf. : #7)

reckart commented 6 years ago

FYI @azielinskiACC

galanisd commented 6 years ago

OK I had a look into Galaxy. VariableMentionDisambiguator is a UIMA component with the following coordinates

eu.openminted.uc-tdm-socialsciences ss-variable-detection 1.0.1-SNAPSHOT

It is available on Maven Central ? zoidberg public snapshots? OMTD repo? -> the executor that we have does not look there.

Also the workflow is created in OMTD Workflow Editor instance of Galaxy. Then OMTD Registry copies it OMTD Workflow Execution instance of Galaxy. Do you know the name of the workflow so I can check if it is there?

reckart commented 6 years ago

OMTD repo? -> the executor that we have does not look there.

It is in the OMTD SNAPSHOTs repo. The registry seems to be able to resolve artifacts from there. Would it be possible to ensure that the executors and the registry use the same sets of repos to look up components, best also in the same order.

The workflow URL is: https://test.openminted.eu/landingPage/application/c58d1986-690e-40b9-b408-f649443c7d33

galanisd commented 6 years ago

It is in the OMTD SNAPSHOTs repo. The registry seems to be able to resolve artifacts from there. Would it be possible to ensure that the executors and the registry use the same sets of repos to look up components, best also in the same order.

Until now it was not required. Added it on my TO-DO list.

The workflow URL is: https://test.openminted.eu/landingPage/application/c58d1986-690e-40b9-b408-f649443c7d33

Downloaded the metadata record from Registry (attached). The workflow name is 0931730980607790@openminted.eu 13865a76-613b-475a-88bf-4af5357b9263

I downloaded it from Galaxy executor (also attached). It is empty, no steps. Probably this is why it fails. It seems a Registry issue.

rec.zip

reckart commented 6 years ago

I'll try building a new one.

galanisd commented 6 years ago

Ok. Please sent me the landing page as you did with previous one. I will download the metadata record find the Galaxy workflow and check if it is OK. If it is not we have to inform Antonis.

reckart commented 6 years ago

Ok, I have created a new one. This time, it is not empty when I re-open it in the workflow editor:

https://test.openminted.eu/landingPage/application/89d5e9ea-32fb-45f7-bf00-1fe466e33c4f

2018-04-08_20-02-53

However, it still fails:

2018-04-08_20-06-45

@azielinskiACC @galanisd note that I have pasted a full multi-line XML file into the parameter variableSpecification - not sure if that could cause a problem. Aside from the XML getting a bit sqashed down when pasting it into the input field, it seemed ok in the Galaxy editor.

<?xml version="1.0" encoding="UTF-8"?>
<variables>
   <variable v_id="140" correct="YesNo">
       <v_label>INGLEHART-INDEX </v_label>
       <v_topic>Political attitudes and participation</v_topic>
       <v_question> What are your political priorities? </v_question>
       <v_subquestion> </v_subquestion>
       <v_answer a_id="1">Postmaterialist</v_answer>
       <v_answer a_id="2">Postmaterialist mixed-type</v_answer>
       <v_answer a_id="3">Materialist mixed-type</v_answer>
       <v_answer a_id="4">Materialist</v_answer>
       <v_answer a_id="5">Don't know</v_answer>
       <v_answer a_id="99">No answer</v_answer>
   </variable>
</variables>

The other thing is that the component should try to download a model from the OMTD Maven repo. That means it must have network access to that repo.

    <groupId>eu.openminted.uc-tdm-socialsciences</groupId>
    <artifactId>ss-variable-detection-model-disambiguation-en-ss</artifactId>
    <version>20180406.1</version>

Hm... that said, it might actually try to download the model from the wrong repo (i.e. the DKPro Core repo instead of the OMTD repo...). That is something I need to look into locally.

reckart commented 6 years ago

Opened an issue regarding model-auto-downloads here: https://github.com/openminted/omtd-component-executor/issues/1

galanisd commented 6 years ago

Yes now it not empty. The workflow is this 0931730980607790@openminted.eu 3c6c03b5-9a04-41bb-996a-a2cd536c7ace

I see a the following error in the logs workflow-service which is the module that call Galaxy.

--- [ Thread-625] e.o.w.service.WorkflowServiceImpl : Unable to locate workflow: 0931730980607790%40openminted.eu+3c6c03b5-9a04-41bb-996a-a2cd536c7ace

Maybe it has to do with the name of the workflow. It contains spaces and a "@" which are escaped at some point. @courado @greenwoodma @antleb

reckart commented 6 years ago

Ok. I have:

Then I tried running the workflow again on the variable test corpus that @azielinskiACC has published on the platform.

Still, I get a failure again.

Any idea what could be the reason now?

galanisd commented 6 years ago

I assume that again the workflow-service fails to call the workflow that was created @ Galaxy executor. As I said above probably the reason is the name of the workflow.

greenwoodma commented 6 years ago

I've just pushed a fix for this that should URL decode the workflow name before looking for it in Galaxy. This should get built and pushed to beta automatically but won't end up on test until someone manually pulls in the latest workflow service code.

courado commented 6 years ago

I have also added the error message supplied from the workflow service under the My Operations page

reckart commented 6 years ago

@courado great! :)

2018-04-11_11-45-27

I just tried running the workflow again, but it fails being unable to locate the named workflow.

Could somebody please push @greenwoodma `s fix to test.openminted.eu?

greenwoodma commented 6 years ago

@reckart is it not possible to rename the workflow to avoid the bug until the fix is pushed to test?

reckart commented 6 years ago

@greenwoodma how do I do that? The workflow editor only has a "save" button, not a "rename" or "save as" button as far as I remember.

galanisd commented 6 years ago

I think that the only way to do that is

a. rename the workflow in Galaxy b. download the metadata record of the app. delete it from the registry c. upload an updated metadata record with the new workflow name.

greenwoodma commented 6 years ago

@reckart hmmm I thought the name of the workflow came from the name you gave the app in the registry UI, but maybe not, or maybe you can't change it there either. Certainly the workflow editor just gets passed the name from the platform it doesn't generate it.

reckart commented 6 years ago

Well, the name I have given to the workflow in the registry UI is "Simple Variable Disambiguation Example (English)". 0931730980607790@openminted.eu 3c6c03b5-9a04-41bb-996a-a2cd536c7ace looks like an auto-generated ID over which I probably do not have control. My guess would be that it is a representation of the user-id concatenated with some other ID...

greenwoodma commented 6 years ago

What's weird is that if all workflow IDs are generated the same way then how have we ever run a workflow as we'd have hit this issue every time? I'm seriously confused by this one.

reckart commented 6 years ago

Apparently one can edit the workflow name in Galaxy by clicking on the pre-generated name, entering a new value and pressing ENTER. I did that (see screenshot).

2018-04-11_12-08-31

However, when I press "save" now, nothing happens. Odd...

Ok, when I go back to "My applications" and re-open the workflow in the editor, I can see that the name I put is still there, so I guess the "save" must have worked.

I wonder what happens if I created a second workflow by the same name...

Anyway, running the now re-named workflow still gives me the same message:

Failed 
Unable to locate named workflow

@courado the "My operations" view has a date, but not a time stamp. It would be great if we could also see the submission and possibly completion times of the execution there.

galanisd commented 6 years ago

@greenwoodma

Workflow names @ Galaxy are not generated with the same way.

Also workflow ID is a different thing that workflow name. For each workflow name there is an internal unique workflow ID; the one you retrieve in workflow-service from Galaxy so that you initiate a workflow execution.

galanisd commented 6 years ago

@greenwoodma

Apparently one can edit the workflow name in Galaxy by clicking on the pre-generated name, entering a new value and pressing ENTER. I did that (see screenshot).

I assume that this shouldn't be allowed and should be hidden as some other things @ Galaxy Editor.

greenwoodma commented 6 years ago

@galanisd yep, that most definitely shouldn't be allowed. I'll add it to the list of things I need to fix.

galanisd commented 6 years ago

OK great! It is not a blocking issue but this is

The applications that are created in Galaxy editor and then ingested in OMTD Registry seem to have this problem.

@courado

reckart commented 6 years ago

OK great! It is not a blocking issue but this is

The applications that are created in Galaxy editor and then ingested in OMTD Registry seem to have this problem.

Indeed. We cannot proceed in the SSH UC (WP9) due to this issue at the moment.

@azielinskiACC

courado commented 6 years ago

Ok so:

PS @galanisd:

  1. The (un)escaping of the workflow name should be managed automatically from spring.
  2. Clarify Workflow IDs are volatile and used internally, I do not send them to the workflow service, I just relay the workflow name which is the only thing remaining constant.
greenwoodma commented 6 years ago

Thanks @courado that all makes sense. Looks like the only issue is that the auto generated name doesn't get unencoded properly when passed into the workflow service hence it's looking for a workflow containing %40 instead of @ etc. My fix was to add a decode call inside the workflow service which should solve the problem once that code is deployed to test.

I agree the workflow IDs are volatile as they change everytime you export/import them into Galaxy, the only thing that's fixed is the workflow name so we do need that to be autog-enerated and unique so the current approach is great, I just need to fix the editor to stop people being able to change the name.

galanisd commented 6 years ago

Thanks @courado that all makes sense. Looks like the only issue is that the auto generated name doesn't get unencoded properly when passed into the workflow service hence it's looking for a workflow containing %40 instead of @ etc. My fix was to add a decode call inside the workflow service which should solve the problem once that code is deployed to test.

@courado Is it possible to redeploy only workflow-service @ test so that we can check if Mark's fix works? I think this is the easiest solution.

greenwoodma commented 6 years ago

I've just pushed a fix to the galaxy editor branch which stops you editing the workflow name from within the editor so that should appear next time test is fully updated with the latest versions of everything (assuming updating test includes the galaxy editor)

reckart commented 6 years ago

I am still getting Failed - Unable to locate named workflow.

Could anybody please update test.openminted.eu with the fixes that were discussed and implemented?

I assume this will not be the last issue in the attempt of getting the SSH UC components running on the platform... and time is running out quickly.

greenwoodma commented 6 years ago

@reckart apparently test.openminted.eu has now been updated (sometime yesterday morning, plus again right now) so if you could try running your workflow again and see what happens?

reckart commented 6 years ago

Well... guess what: Failed - Unable to locate named workflow

greenwoodma commented 6 years ago

Damn, damn, damn and damn!

What I don't understand is that I tried to reproduce this myself by creating a new workflow through the galaxy editor and it worked. Having said that it looks as if it's worked because the workflow name doesn't contain an @ symbol like yours does.

Could you try creating a new workflow to see if that works (i.e. if something in the registry has changed the way it creates workflow names). The only other thing I can think of is that while test has been updated the workflow service is still the old one, but I'm not sure how to check that. @galanisd any ideas how we would check if the latest code had made it to test?

greenwoodma commented 6 years ago

also @reckart did you ever change the workflow name back after you managed to edit it? If not that would certainly screw things up

reckart commented 6 years ago

@greenwoodma I have no idea what the old name was.

greenwoodma commented 6 years ago

@reckart well that explains things then. Looking earlier in the issue I think it was

0931730980607790@openminted.eu 13865a76-613b-475a-88bf-4af5357b9263

if you can change it back to that then it might work, otherwise you need to create a new workflow to see if this has been fixed or not

reckart commented 6 years ago

2018-04-13_10-53-06

I have rebuilt the workflow from scratch... now it is "running". Let's see if it terminates.

greenwoodma commented 6 years ago

@reckart is it worth closing this issue then, given it's now quite long and focused on the workflow name bug, and then opening another one if it fails with a different error?

reckart commented 6 years ago

@greenwoodma I have changed the issue title to "OpenMinTeD SSH UC Hackathon" - while the workflow name issue seems to be resolve now, the workflow still has not completed successfully.

greenwoodma commented 6 years ago

@reckart makes sense, just didn't know if you wanted a clean slate to report new issues, but renaming it for the hackathon makes sense

reckart commented 6 years ago

What is the usual time between status "running" and "completed"?

@courado as a feature request to the registry:

reckart commented 6 years ago

The workflow is trivial and the corpus is rather small, still the workflow is still in "running" state after three hours...

greenwoodma commented 6 years ago

@reckart how small is small? @antleb e-mailed me earlier about a slow running workflow. I'm beginning to wonder if the galaxy executor instance has been redeployed without the speed up fix we worked on for the issue that arose during the Paris meeting. I'm not entirely sure how to go about checking if that fix is in place or not -- will try and dig out the details for logging into the machine to check.

reckart commented 6 years ago

The corpus description says "one file"

https://test.openminted.eu/landingPage/corpus/9f4ebc21-aebe-4fb2-90c9-59bd189b9619

The corpus browser doesn't seem to work on that particular corpus.

galanisd commented 6 years ago

Someone has sent to executor ~2 hour ago an 1 GB corpus.

1gb

It will take ages.

greenwoodma commented 6 years ago

okay daft question then..... don't we allow parallel executions? I thought that was the point of the cloud backend?

galanisd commented 6 years ago

Did you try to download it? https://test.openminted.eu/landingPage/corpus/9f4ebc21-aebe-4fb2-90c9-59bd189b9619

It is empty. So assume no data were feeded to the workflow. I think that in such cases workflow-service is not able to understand that processing has finished. Not sure.

@courado @antleb But why is empty?