missuse / ragp

Filter plant hydroxyproline rich glycoproteins
MIT License
5 stars 4 forks source link

Signalp6 #8

Open kijavko opened 2 years ago

kijavko commented 2 years ago

SignalP6 is available https://www.biorxiv.org/content/10.1101/2021.06.09.447770v1.full, https://services.healthtech.dtu.dk/service.php?SignalP. It seems slow currently but it would be nice if ragp supported communication with it.

missuse commented 2 years ago

Nice find, I was not aware Signalp6 was out.

I checked the speed of the fast method with five test sequences. It finished in 30s which is not that bad. The mirror https://dtu.biolib.com/SignalP-6 doesn't seem to work.

It seems the group changed the way the server code is presented.

The link: https://services.healthtech.dtu.dk/services/SignalP-6.0/1-Submission.php

seems to contain the familiar web form.

@kijavko could you try to make a get_signalp6 using get_signalp5 code as template? If any problems arise I can take a look.

Key code:

url <- "https://services.healthtech.dtu.dk/cgi-bin/webface2.fcgi"
cfg <- "/var/www/html/services/SignalP-6.0/webface.cf"

file_up <-  httr::upload_file("test.fa")

res <- httr::POST(
  url = url,
  encode = "multipart",
  body = list(
    `configfile` = cfg,
    `uploadfile` =  file_up,
    `organism` = "Eukarya",
    `format` = "short",
    `mode` = "fast"
  ))
PhilPlantMan commented 2 years ago

Hi @missuse,

I really love this package and I really wish I had the capacity to contribute to it....this is at the top of my list when I have time! In the mean time, I don't suppose you have any capacity to look at this issue? get_signalp is returning a server error when I try to run it and I think it might be related to this issue. I recently published a paper that uses this package (I did cite you!) and now I realise my code is broken. Kind regards, Phil

missuse commented 2 years ago

Hi @PhilPlantMan,

glad you are still using the package! Thanks for the citation.

Your contributions will be welcome when ever you have time.

I have sketched a direction with some issues https://github.com/missuse/ragp/issues, but haven't had the time to implement these functionalities yet.

If you have ideas please open a issue or a PR.

The current issue is about implementing a function that queries Signalp version 6 based on the transformer protein language. It is not related to your problem simply because this is not yet implemented in ragp.

Regarding the current problem you are experiencing. I have installed the last version of ragp and run this:

library(ragp)
#> This is ragp >= 0.3.5 which adds several new features and consequently breaking changes.
#> Please read the NEWS: https://github.com/missuse/ragp/blob/master/NEWS.md.
#> If you encounter any problems please report them: 
#> https://github.com/missuse/ragp/issues

data(at_nsp)

get_signalp(data = at_nsp[1:13,],
            sequence,
            Transcript.id,
            progress = TRUE)
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>             id  Cmax Cmax.pos  Ymax Ymax.pos  Smax Smax.pos Smean Dmean is.sp
#> 1  ATCG00660.1 0.210       30 0.162       30 0.260        2 0.146 0.154     N
#> 2  AT2G43600.1 0.860       23 0.894       23 0.984       13 0.930 0.913     Y
#> 3  AT2G28410.1 0.779       23 0.826       23 0.940       15 0.877 0.853     Y
#> 4  AT2G22960.1 0.701       23 0.790       23 0.948       15 0.891 0.844     Y
#> 5  AT2G19580.1 0.422       26 0.586       26 0.885       17 0.798 0.671     Y
#> 6  AT2G19690.2 0.797       29 0.870       29 0.987       18 0.952 0.914     Y
#> 7  AT2G19690.1 0.797       29 0.870       29 0.987       18 0.952 0.914     Y
#> 8  AT2G33130.1 0.318       33 0.530       27 0.990       16 0.933 0.748     Y
#> 9  AT2G05520.1 0.633       24 0.782       24 0.990       16 0.966 0.881     Y
#> 10 AT2G05520.2 0.633       24 0.782       24 0.990       16 0.966 0.881     Y
#> 11 AT2G05520.3 0.633       24 0.782       24 0.989       16 0.965 0.881     Y
#> 12 AT2G05520.4 0.633       24 0.782       24 0.989       16 0.965 0.881     Y
#> 13 AT2G05520.6 0.633       24 0.782       24 0.989       16 0.965 0.881     Y
#>    Dmaxcut Networks.used is.signalp sp.length
#> 1    0.450  SignalP-noTM      FALSE        30
#> 2    0.450  SignalP-noTM       TRUE        23
#> 3    0.450  SignalP-noTM       TRUE        23
#> 4    0.450  SignalP-noTM       TRUE        23
#> 5    0.500    SignalP-TM       TRUE        26
#> 6    0.450  SignalP-noTM       TRUE        29
#> 7    0.450  SignalP-noTM       TRUE        29
#> 8    0.450  SignalP-noTM       TRUE        27
#> 9    0.450  SignalP-noTM       TRUE        24
#> 10   0.450  SignalP-noTM       TRUE        24
#> 11   0.450  SignalP-noTM       TRUE        24
#> 12   0.450  SignalP-noTM       TRUE        24
#> 13   0.450  SignalP-noTM       TRUE        24

get_signalp5(data = at_nsp[1:13,],
             sequence,
             Transcript.id,
             progress = TRUE)
#> batch 1 jobid is: 62B46991000058867DBE69D3
#>             id  Prediction SP.Sec.SPI    Other CS_pos     Pr cleave.site
#> 1  ATCG00660.1       OTHER   0.000375 0.999625            NA            
#> 2  AT2G43600.1 SP(Sec/SPI)   0.999802 0.000198  22-23 0.9639      VFS-QN
#> 3  AT2G28410.1 SP(Sec/SPI)   0.990424 0.009576  22-23 0.8897      ALA-QD
#> 4  AT2G22960.1 SP(Sec/SPI)   0.998142 0.001858  22-23 0.9424      AES-GS
#> 5  AT2G19580.1       OTHER   0.264792 0.735208            NA            
#> 6  AT2G19690.2 SP(Sec/SPI)   0.989540 0.010460  28-29 0.9030      ARS-EE
#> 7  AT2G19690.1 SP(Sec/SPI)   0.989540 0.010460  28-29 0.9030      ARS-EE
#> 8  AT2G33130.1 SP(Sec/SPI)   0.959278 0.040722  26-27 0.5119      VVG-SR
#> 9  AT2G05520.1 SP(Sec/SPI)   0.999493 0.000507  23-24 0.5666      VAA-AS
#> 10 AT2G05520.2 SP(Sec/SPI)   0.999493 0.000507  23-24 0.5666      VAA-AS
#> 11 AT2G05520.3 SP(Sec/SPI)   0.999416 0.000584  23-24 0.5699      VAA-AS
#> 12 AT2G05520.4 SP(Sec/SPI)   0.999419 0.000581  23-24 0.5677      VAA-AS
#> 13 AT2G05520.6 SP(Sec/SPI)   0.999419 0.000581  23-24 0.5677      VAA-AS
#>    is.signalp sp.length
#> 1       FALSE        NA
#> 2        TRUE        22
#> 3        TRUE        22
#> 4        TRUE        22
#> 5       FALSE        NA
#> 6        TRUE        28
#> 7        TRUE        28
#> 8        TRUE        26
#> 9        TRUE        23
#> 10       TRUE        23
#> 11       TRUE        23
#> 12       TRUE        23
#> 13       TRUE        23

Created on 2022-06-23 by the reprex package (v2.0.1)

As far as I can tell it is working normally.

Potential causes of your problem:

  1. outdated ragp version - previous version used parallel queries to the server, this is no longer permitted.
  2. SignalP allows one concurrent job per ip, if a job hangs, or takes a long time and you press stop while running the R function the server will not permit the next job to be run. Try to run the query directly on the server, if you get the error also there then it means you must change your ip or wait it out.
PhilPlantMan commented 2 years ago

Hi @PhilPlantMan,

glad you are still using the package! Thanks for the citation.

Your contributions will be welcome when ever you have time.

I have sketched a direction with some issues https://github.com/missuse/ragp/issues, but haven't had the time to implement these functionalities yet.

If you have ideas please open a issue or a PR.

The current issue is about implementing a function that queries Signalp version 6 based on the transformer protein language. It is not related to your problem simply because this is not yet implemented in ragp.

Regarding the current problem you are experiencing. I have installed the last version of ragp and run this:

library(ragp)
#> This is ragp >= 0.3.5 which adds several new features and consequently breaking changes.
#> Please read the NEWS: https://github.com/missuse/ragp/blob/master/NEWS.md.
#> If you encounter any problems please report them: 
#> https://github.com/missuse/ragp/issues

data(at_nsp)

get_signalp(data = at_nsp[1:13,],
            sequence,
            Transcript.id,
            progress = TRUE)
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>             id  Cmax Cmax.pos  Ymax Ymax.pos  Smax Smax.pos Smean Dmean is.sp
#> 1  ATCG00660.1 0.210       30 0.162       30 0.260        2 0.146 0.154     N
#> 2  AT2G43600.1 0.860       23 0.894       23 0.984       13 0.930 0.913     Y
#> 3  AT2G28410.1 0.779       23 0.826       23 0.940       15 0.877 0.853     Y
#> 4  AT2G22960.1 0.701       23 0.790       23 0.948       15 0.891 0.844     Y
#> 5  AT2G19580.1 0.422       26 0.586       26 0.885       17 0.798 0.671     Y
#> 6  AT2G19690.2 0.797       29 0.870       29 0.987       18 0.952 0.914     Y
#> 7  AT2G19690.1 0.797       29 0.870       29 0.987       18 0.952 0.914     Y
#> 8  AT2G33130.1 0.318       33 0.530       27 0.990       16 0.933 0.748     Y
#> 9  AT2G05520.1 0.633       24 0.782       24 0.990       16 0.966 0.881     Y
#> 10 AT2G05520.2 0.633       24 0.782       24 0.990       16 0.966 0.881     Y
#> 11 AT2G05520.3 0.633       24 0.782       24 0.989       16 0.965 0.881     Y
#> 12 AT2G05520.4 0.633       24 0.782       24 0.989       16 0.965 0.881     Y
#> 13 AT2G05520.6 0.633       24 0.782       24 0.989       16 0.965 0.881     Y
#>    Dmaxcut Networks.used is.signalp sp.length
#> 1    0.450  SignalP-noTM      FALSE        30
#> 2    0.450  SignalP-noTM       TRUE        23
#> 3    0.450  SignalP-noTM       TRUE        23
#> 4    0.450  SignalP-noTM       TRUE        23
#> 5    0.500    SignalP-TM       TRUE        26
#> 6    0.450  SignalP-noTM       TRUE        29
#> 7    0.450  SignalP-noTM       TRUE        29
#> 8    0.450  SignalP-noTM       TRUE        27
#> 9    0.450  SignalP-noTM       TRUE        24
#> 10   0.450  SignalP-noTM       TRUE        24
#> 11   0.450  SignalP-noTM       TRUE        24
#> 12   0.450  SignalP-noTM       TRUE        24
#> 13   0.450  SignalP-noTM       TRUE        24

get_signalp5(data = at_nsp[1:13,],
             sequence,
             Transcript.id,
             progress = TRUE)
#> batch 1 jobid is: 62B46991000058867DBE69D3
#>             id  Prediction SP.Sec.SPI    Other CS_pos     Pr cleave.site
#> 1  ATCG00660.1       OTHER   0.000375 0.999625            NA            
#> 2  AT2G43600.1 SP(Sec/SPI)   0.999802 0.000198  22-23 0.9639      VFS-QN
#> 3  AT2G28410.1 SP(Sec/SPI)   0.990424 0.009576  22-23 0.8897      ALA-QD
#> 4  AT2G22960.1 SP(Sec/SPI)   0.998142 0.001858  22-23 0.9424      AES-GS
#> 5  AT2G19580.1       OTHER   0.264792 0.735208            NA            
#> 6  AT2G19690.2 SP(Sec/SPI)   0.989540 0.010460  28-29 0.9030      ARS-EE
#> 7  AT2G19690.1 SP(Sec/SPI)   0.989540 0.010460  28-29 0.9030      ARS-EE
#> 8  AT2G33130.1 SP(Sec/SPI)   0.959278 0.040722  26-27 0.5119      VVG-SR
#> 9  AT2G05520.1 SP(Sec/SPI)   0.999493 0.000507  23-24 0.5666      VAA-AS
#> 10 AT2G05520.2 SP(Sec/SPI)   0.999493 0.000507  23-24 0.5666      VAA-AS
#> 11 AT2G05520.3 SP(Sec/SPI)   0.999416 0.000584  23-24 0.5699      VAA-AS
#> 12 AT2G05520.4 SP(Sec/SPI)   0.999419 0.000581  23-24 0.5677      VAA-AS
#> 13 AT2G05520.6 SP(Sec/SPI)   0.999419 0.000581  23-24 0.5677      VAA-AS
#>    is.signalp sp.length
#> 1       FALSE        NA
#> 2        TRUE        22
#> 3        TRUE        22
#> 4        TRUE        22
#> 5       FALSE        NA
#> 6        TRUE        28
#> 7        TRUE        28
#> 8        TRUE        26
#> 9        TRUE        23
#> 10       TRUE        23
#> 11       TRUE        23
#> 12       TRUE        23
#> 13       TRUE        23

Created on 2022-06-23 by the reprex package (v2.0.1)

As far as I can tell it is working normally.

Potential causes of your problem:

  1. outdated ragp version - previous version used parallel queries to the server, this is no longer permitted.
  2. SignalP allows one concurrent job per ip, if a job hangs, or takes a long time and you press stop while running the R function the server will not permit the next job to be run. Try to run the query directly on the server, if you get the error also there then it means you must change your ip or wait it out.

Thank you so much for your reply! I shall try out what you suggested now. All the best, Phil

missuse commented 2 years ago

I also tried:

get_signalp5(data = at_nsp[1:200,],
             sequence,
             Transcript.id,
             progress = TRUE,
             splitter = 100)

so that the data gets split into batches and processed, and it works as expected also. If you can provide a reproducible example of the problem I will give it a look.

PhilPlantMan commented 2 years ago

Hi @missuse Updating the package did the trick; I feel foolish not trying this first! I'm very grateful for your help (and the creation of this great tool!) All the best Phil