rsa-tools / rsat-code

This repo contains the code required to run a local version of the software suite Regulatory Sequence Analysis Tools (RSAT).
http://rsat.eu
GNU Affero General Public License v3.0
5 stars 6 forks source link

Problems with RSAT installation and running matrix-clustering #1

Closed martynakgajos closed 4 weeks ago

martynakgajos commented 4 years ago

Hi,

I am trying to run the matrix-clustering however I cannot figure out what is wrong with my input.

I had also some problems during installation, but in the manual it is written that it can happen and the software should still work (btw I have also tried the conda package and it hasn't worked for me yet).

I am using the following command: matrix-clustering -matrix PP inputfile.meme meme -o outputdir

I get the following errors:

rsync: change_dir "/home/gajos/Programs/rsat/public_html/images/program_icons" failed: No such file or directory (2) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1052) [sender=3.0.9] ; WARNING Matrix file /project/motif.meme does not contain any matrix in meme format. Please check format. {... more errors as the result of this one} Error in library(pkg, warn.conflicts = FALSE, character.only = TRUE, lib.loc = c(dir.rsat.rlib, : there is no package called ‘amap’ Calls: suppressPackageStartupMessages -> withCallingHandlers -> library Execution halted Error OpenInputFile: File _tables/clusters.tab does not exist.

I can attached the input file I am using. I think it is properly formated but maybe I am wrong.

Could you help me solve it or reference a tutorial that I should read?

morganeTC commented 4 years ago

Thank you for reporting this issue, we are going to update the installation documentation, as there seems to be an issue in the documentation, making the matrix-clustering tool not working.

On 5 Feb 2020, at 15:21, Martyna Gajos notifications@github.com wrote:

Hi,

I am trying to run the matrix-clustering however I cannot figure out what is wrong with my input.

I had also some problems during installation, but in the manual it is written that it can happen and the software should still work (btw I have also tried the conda package and it hasn't worked for me yet).

I am using the following command: matrix-clustering -matrix PP inputfile.meme meme -o outputdir

I get the following errors:

rsync: change_dir "/home/gajos/Programs/rsat/public_html/images/program_icons" failed: No such file or directory (2) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1052) [sender=3.0.9] ; WARNING Matrix file /project/motif.meme does not contain any matrix in meme format. Please check format. {... more errors as the result of this one} Error in library(pkg, warn.conflicts = FALSE, character.only = TRUE, lib.loc = c(dir.rsat.rlib, : there is no package called ‘amap’ Calls: suppressPackageStartupMessages -> withCallingHandlers -> library Execution halted Error OpenInputFile: File _tables/clusters.tab does not exist.

I can attached the input file I am using. I think it is properly formated but maybe I am wrong.

Could you help me solve it or reference a tutorial that I should read?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rsa-tools/rsat-code/issues/1?email_source=notifications&email_token=ACCNWMPP2YLYVUSES5IUTEDRBLDMZA5CNFSM4KQL47WKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ILHCPPA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCNWMK3GLNTTZM5BPPOMB3RBLDMZANCNFSM4KQL47WA.

jvanheld commented 4 years ago

Dear Martyna

Thank you for the feedback and sorry for the problems you faced with the installation. I think the problem was coming from a failure to install R 3.6, which resulted in the use of an older version (3.4) where some librairies were missing.

I fixed the problem on rsat-code but not yet on the conda package.

Could you please check if the installation works properly on your side ?

In principle you would be able to do it in the following way:

Get a git clone with the fixed version

git clone https://github.com/rsa-tools/rsat-code.git

Synchronise the updated installer to your own RSAT installation (this is a temporary fix)

rsync -ruptvl rsat-code/installer $RSAT/ cd $RSAT

sudo bash source RSAT_config.bashrc export MY_OS=ubuntu

Read config and run bash installation scripts

source RSAT_config.bashrc && \ bash installer/07_R-and-packages.bash

Best regards,

Jacques

Aix-Marseille Université (AMU). Lab. Theory and Approaches of Genomic Complexity (TAGC) INSERM Unit UMR_S 1090, 163, Avenue de Luminy, 13288 MARSEILLE cedex 09. France Office: INSERM building, block 6 Fax: +33 4 91 82 87 01

Jacques.van-Helden@univ-amu.fr https://orcid.org/0000-0002-8799-8584

On 5 Feb 2020, at 08:21, Martyna Gajos notifications@github.com wrote:

Hi,

I am trying to run the matrix-clustering however I cannot figure out what is wrong with my input.

I had also some problems during installation, but in the manual it is written that it can happen and the software should still work (btw I have also tried the conda package and it hasn't worked for me yet).

I am using the following command: matrix-clustering -matrix PP inputfile.meme meme -o outputdir

I get the following errors:

rsync: change_dir "/home/gajos/Programs/rsat/public_html/images/program_icons" failed: No such file or directory (2) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1052) [sender=3.0.9] ; WARNING Matrix file /project/motif.meme does not contain any matrix in meme format. Please check format. {... more errors as the result of this one} Error in library(pkg, warn.conflicts = FALSE, character.only = TRUE, lib.loc = c(dir.rsat.rlib, : there is no package called ‘amap’ Calls: suppressPackageStartupMessages -> withCallingHandlers -> library Execution halted Error OpenInputFile: File _tables/clusters.tab does not exist.

I can attached the input file I am using. I think it is properly formated but maybe I am wrong.

Could you help me solve it or reference a tutorial that I should read?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rsa-tools/rsat-code/issues/1?email_source=notifications&email_token=ACT3M2RSBBQSR3Z7JEPALKDRBLDMZA5CNFSM4KQL47WKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ILHCPPA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACT3M2V2FUJG5PQVQB2RATLRBLDMZANCNFSM4KQL47WA.

martynakgajos commented 4 years ago

Thank you for a fast answer. Unfortunately, the error still occurs.

1) I believe there might be something wrong with the installation of perl packages as it appears already during reading the matrix file that I am giving and it is done by .../rsat/perl-scripts/convert-matrix function . The problem might also be occurring because my .meme file might be malformated. Here is an example how my meme files look like:

MEME version 4

ALPHABET= ACGT

Background letter frequencies A 0.263173 C 0.23813 G 0.224177 T 0.27452

MOTIF CACGCCAA letter-probability matrix: alength= 4 w= 8 nsites= 909 bg_prob= 0.00001528 opt_bg_order= 2 log(Pval)= -7.49645185 0.00000018 0.99999982 0.00000001 0.00000001 0.99999988 0.00000013 0.00000001 0.00000001 0.01502429 0.98493105 0.00000001 0.00004464 0.00000002 0.00002313 0.99997681 0.00000008 0.00000001 0.99999809 0.00000194 0.00000001 0.00000024 0.90720397 0.00000001 0.09279580 0.99988377 0.00000149 0.00002128 0.00009340 0.99999881 0.00000065 0.00000056 0.00000001

MOTIF AAGGCGGC letter-probability matrix: alength= 4 w= 8 nsites= 789 bg_prob= 0.00001728 opt_bg_order= 2 log(Pval)= -6.42772436 0.95745504 0.00000011 0.00957859 0.03296627 0.99978441 0.00021301 0.00000255 0.00000001 0.00000009 0.01020189 0.98568732 0.00411074 0.00000015 0.00000112 0.99990678 0.00009197 0.00070567 0.99928975 0.00000441 0.00000017 0.00794884 0.00096019 0.99108815 0.00000275 0.00000001 0.00078741 0.94543123 0.05378127 0.00000002 0.99579847 0.00420156 0.00000003

2) I was trying to circumvent the problem, convert matrices myself to a desired format and just run the .../rsat/R-scripts/matrix-clustering.R on the matrices. However, I have a problem to understand the log file that I am getting (trying to run the convert-matrix function): after the matrices are split into separate files (.../rsat/perl-scripts/convert-matrix -i input_path.tf -split -from tf -to tf -o output_directory), the next step is just running the .../rsat/R-scripts/matrix-clustering.R . The .../rsat/R-scripts/matrix-clustering.R requires pairwise_compa.tab that has never been created. Should I prepare the comparison table using compare-matrices first?

jaimicore commented 4 years ago

Hi

I found the problem.

Here is an example of MEME-ChIP output motifs:

http://meme-suite.org/doc/examples/memechip_example_output_files/combined.meme

In your motifs there is a missing parameter (E) that is required in convert-matrix to convert the formats.

Remove this in your header:

bg_prob= 0.00001528 opt_bg_order= 2 

And replace

log(Pval)= -7.49645185

by

E= -7.49645185

I used the following input in convert-matrix and it works:

MEME version 4

ALPHABET= ACGT

Background letter frequencies
A 0.263173 C 0.23813 G 0.224177 T 0.27452

MOTIF CACGCCAA
letter-probability matrix: alength= 4 w= 8 nsites= 909 E= -7.49645185
0.00000018 0.99999982 0.00000001 0.00000001
0.99999988 0.00000013 0.00000001 0.00000001
0.01502429 0.98493105 0.00000001 0.00004464
0.00000002 0.00002313 0.99997681 0.00000008
0.00000001 0.99999809 0.00000194 0.00000001
0.00000024 0.90720397 0.00000001 0.09279580
0.99988377 0.00000149 0.00002128 0.00009340
0.99999881 0.00000065 0.00000056 0.00000001

MOTIF AAGGCGGC
letter-probability matrix: alength= 4 w= 8 nsites= 789 E= -6.42772436
0.95745504 0.00000011 0.00957859 0.03296627
0.99978441 0.00021301 0.00000255 0.00000001
0.00000009 0.01020189 0.98568732 0.00411074
0.00000015 0.00000112 0.99990678 0.00009197
0.00070567 0.99928975 0.00000441 0.00000017
0.00794884 0.00096019 0.99108815 0.00000275
0.00000001 0.00078741 0.94543123 0.05378127
0.00000002 0.99579847 0.00420156 0.00000003

I will check if the MEME motifs in the current version of meme 5.1 have a new header and update convert-matrix if necessary

martynakgajos commented 4 years ago

great, now I am clearly on the right path, as the error I get comes from the R package:

shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
Error in library(pkg, warn.conflicts = FALSE, character.only = TRUE, lib.loc = c(dir.rsat.rlib,  : 
  there is no package called ‘TFBMclust’
Calls: suppressPackageStartupMessages -> withCallingHandlers -> library
Execution halted
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory

I have exchanged the lines:

for (package in required.packages.rsat) {
  message("Installing RSAT package ", package, " in folder ", install.dir)
  install.packages(pkgs=file.path(dir.rsat.rscripts, package), repos=NULL,  lib=install.dir, type="source")
}

with

for (package in required.packages.rsat) {
  message("Installing RSAT package ", package, " in folder ", install.dir)
  install.packages(pkgs=file.path(dir.rsat.rscripts, package), repos=NULL, type="source")
}

and it works now. However, out of 111 motifs, I am getting one cluster. Is it possible to add some parameters to increase the number of clusters?

jaimicore commented 4 years ago

Yes,

These are the default parameters, that maximize the grouping of motifs from the same TF family:

-hclust_method average -calc sum -metric_build_tree Ncor -lth w 5 -lth cor 0.6 -lth Ncor 0.4

In case you want something more stringent try the next ones:

-lth cor 0.75 -lth Ncor 0.55

or

-lth cor 0.8 -lth Ncor 0.6

You can find more info in the paper: https://doi.org/10.1093/nar/gkx314

martynakgajos commented 4 years ago

Hi,

how is the conda release going? ;) I am trying to cluster motifs from RNA bind motifs database ATtRACT (https://attract.cnic.es/). I have downloaded the database and converted PFMs to meme format (as I have already managed to run the clustering for meme input). Here is how the beginning of the file looks like:

MEME version 4
ALPHABET= ACGU

Background letter frequencies
A 0.25 C 0.25 G 0.25 U 0.25

MOTIF 904 904
letter-probability matrix: alength= 4 w= 5 nsites= 100000 E= -5.0
0.00961538461538    0.00961538461538    0.00961538461538    0.971153846154
0.00961538461538    0.00961538461538    0.971153846154  0.00961538461538
0.00961538461538    0.00961538461538    0.971153846154  0.00961538461538
0.00961538461538    0.00961538461538    0.971153846154  0.00961538461538
0.971153846154  0.00961538461538    0.00961538461538    0.00961538461538

MOTIF s36 s36
letter-probability matrix: alength= 4 w= 7 nsites= 100000 E= -5.0
0.844325153374  0.000766871165644   0.154141104294  0.000766871165644
0.0774539877301 0.690950920245  0.0774539877301 0.154141104294
0.000766871165644   0.0774539877301 0.154141104294  0.76763803681
0.76763803681   0.0774539877301 0.000766871165644   0.154141104294
0.0774539877301 0.0774539877301 0.76763803681   0.0774539877301
0.230828220859  0.614263803681  0.0774539877301 0.0774539877301
0.460889570552  0.230828220859  0.154141104294  0.154141104294

(I have added dummies for the number of sites and E-values, to check if it works, as I haven't found the values in the database.) I get the following message:

/home/gajos/Programs/rsat/perl-scripts/convert-matrix -i output/meme.meme -from meme -to tf -o output/_data/RNA_input_motifs_processed_1.tf /home/gajos/Programs/rsat/bin/rsat:65: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. ref = yaml.load(open(path + '/rsat.yaml').read())#, Loader=yaml.FullLoader) sh: ./R: Is a directory Error OpenInputFile: File output/_tables/clusters.tab does not exist.

What got created are:

The file output/_tables/clusters.tab indeed does not exist. What might be a problem in this case?

morganeTC commented 4 years ago

Dear Martyna,

I will let Jacques answer regarding the conda issue.

Meanwhile, the Attract database has already been included in RSAT, and is ready to use : https://rsat01.biologie.ens.fr/rsat/motif_databases/ATtRACT/ATtRACT_2017_12.tf

Kind regards,

Morgane

On 8 May 2020, at 16:25, Martyna Gajos notifications@github.com wrote:

Hi,

how is the conda release going? ;) I am trying to cluster motifs from RNA bind motifs database ATtRACT (https://attract.cnic.es/ https://attract.cnic.es/). I have downloaded the database and converted PFMs to meme format (as I have already managed to run the clustering for meme input). Here is how the beginning of the file looks like:

MEME version 4 ALPHABET= ACGU

Background letter frequencies A 0.25 C 0.25 G 0.25 U 0.25

MOTIF 904 904 letter-probability matrix: alength= 4 w= 5 nsites= 100000 E= -5.0 0.00961538461538 0.00961538461538 0.00961538461538 0.971153846154 0.00961538461538 0.00961538461538 0.971153846154 0.00961538461538 0.00961538461538 0.00961538461538 0.971153846154 0.00961538461538 0.00961538461538 0.00961538461538 0.971153846154 0.00961538461538 0.971153846154 0.00961538461538 0.00961538461538 0.00961538461538

MOTIF s36 s36 letter-probability matrix: alength= 4 w= 7 nsites= 100000 E= -5.0 0.844325153374 0.000766871165644 0.154141104294 0.000766871165644 0.0774539877301 0.690950920245 0.0774539877301 0.154141104294 0.000766871165644 0.0774539877301 0.154141104294 0.76763803681 0.76763803681 0.0774539877301 0.000766871165644 0.154141104294 0.0774539877301 0.0774539877301 0.76763803681 0.0774539877301 0.230828220859 0.614263803681 0.0774539877301 0.0774539877301 0.460889570552 0.230828220859 0.154141104294 0.154141104294 (I have added dummies for the number of sites and E-values, to check if it works, as I haven't found the values in the database.) I get the following message:

/home/gajos/Programs/rsat/perl-scripts/convert-matrix -i output/meme.meme -from meme -to tf -o output/_data/RNA_input_motifs_processed_1.tf /home/gajos/Programs/rsat/bin/rsat:65: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load https://msg.pyyaml.org/load for full details. ref = yaml.load(open(path + '/rsat.yaml').read())#, Loader=yaml.FullLoader) sh: ./R: Is a directory Error OpenInputFile: File output/_tables/clusters.tab does not exist.

What got created are:

tf files for every motif (in output /_data/), pairwise_compa.tab and pairwise_compa_matrix_descriptions.tab (in output /_table/). The file output/_tables/clusters.tab indeed does not exist. What might be a problem in this case?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rsa-tools/rsat-code/issues/1#issuecomment-625841139, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCNWMOBOQ524RF7FBDMN73RQQI53ANCNFSM4KQL47WA.

martynakgajos commented 4 years ago

Hi Morgane,

thank you for the answer. My problem still exists, even if I use the transfac file (https://rsat01.biologie.ens.fr/rsat/motif_databases/ATtRACT/ATtRACT_2017_12.tf).

eead-csic-compbio commented 4 weeks ago

For @martynakgajos and other facing similar problems, the conda version of RSAT is outdated. Instead, the Docker container is being regularly updated and can be run as explained at https://rsa-tools.github.io/installing-RSAT/RSAT-Docker/RSAT-Docker-tuto.html