weberlab-hhu / Helixer

Using Deep Learning to predict gene annotations
GNU General Public License v3.0
164 stars 27 forks source link

Online Version #119

Closed LeoVincenzi closed 6 months ago

LeoVincenzi commented 6 months ago

Dear all, I tried to use the online version of the software two days ago, with a genome of almost 600 Mb. On-screen, it was output the Job was in the queue, but in two days I hadn't received any sort of mail regarding the analysis or the Job status. How does it usually work? How long could it take for a job to be processed?

Cheers, Leo

joeyjoe0111 commented 6 months ago

I met the same problem, too. It's confusing and I also wonder why.

taprs commented 6 months ago

Bump! We have been experiencing it too. It used to work before.

alisandra commented 6 months ago

Thanks to everyone for raising this here and pardon the slow reply!

There is a problem when users submit their Helixer job and there is a large number of jobs in the queue. The browser session expires before the job has completed, and then the user doesn't receive the link to the finished job.

When a job starts, you should note the job-id (eg. GSA-148fd40dc19d969d2458ce8f5280198f). You can then check back later to see if the result file has been generated by checking the URL https://www.plabipd.de/projects/Helixer_output/.zip replace JOB_ID with your job-id

This is of course just a temporary solution. We hope to have a more robust solution soon, and will update here when we do.

alisandra commented 6 months ago

Oh, and for the question on expected run time:

In general, Helixer's run time scales linearly with genome size, and is also affected by gene density, and of course hardware. On a GPU, the expected walltime should be on the order of magnitude of 15-30min per 100 Mbp.

However, for the web version, the time until jobs are finished is of course also dependent on the length of the queue; so all I can say is that it won't be shorter than the above.

hoppo commented 6 months ago

The underlying problem is that we have a much larger amount of users. Therefore, jobs do not complete as quickly, and the web session dies before jobs finish (and an email is sent). The job does continue however, but unfortunately there is no way for the user to be informed when it has completed.

I have implement a solution (deployed yesterday) whereby users will always be notified when their jobs finish (of course, assuming they provide a valid email address).

If you happen to know the job-id (GSA-XXXXXXXX) of your job, you can check to see if it is available for download using the following url:- https://www.plabipd.de/projects/Helixer_output/GSA-XXXXXXXXXXXX.zip (substitute the XXXXXX for your job id).

At the moment, there are 14 jobs in the queue, so any new jobs submitted will likely not get the results for at least a day. Visibly of the number of jobs in the queue is also a feature we plan to implement.

One other point I would like to make. The web based version of Helixer is not intended as a tool for large scale genome annotation. It is provided a a "test platform" to try a portion of your genome (eg. a chromosome) to see if Helixer would be an appropriate tool to use for a particular genome). The idea would be that the user would then download and run the tool on their own infrastructure. Please do not submit the same genome multiple times. If this is noticed by the administrator, the jobs will be killed. You can contact plabipd@fz-juelich.de if you do not receive the the email.

LeoVincenzi commented 6 months ago

Thank you all for your answers! @hoppo I tried to submit a pair of jobs a few hours ago since it was been more than a week since the first attempts, so I hope they won't be killed. Good to know that the online version is better for testing and not for large scale, thanks. Can you confirm so that the-id URL won't work until the job is done?

hoppo commented 6 months ago

No, your jobs won't be killed. We understand that it was most frustrating to submit jobs and not get any feedback, so we are only checking queued jobs since yesterday (when the email bug was fixed).

It is not a trivial undertaking to provide the infrastructure (and maintenance) to maintain an online structural annotation service. We understand that not every research group has the computing infrastructure to be able to run Helixer locally, so we would really urge groups that have the capability to run locally.

And your are correct - the url will not work until a job has completed.

LeoVincenzi commented 6 months ago

I wanted to let you know that now I have recieved the email correctly and I get the results of the job. Thanks all for the help! Leo