johnlees / seer

sequence element (kmer) enrichment analysis
GNU General Public License v2.0
43 stars 9 forks source link

kmds error #30

Closed shimbalama closed 8 years ago

shimbalama commented 8 years ago

Hi,

I tried to run kmds as per the updated instructions and received the following error:

$ kmds -p metadata.pheno --mds_concat subsampled_matrices.txt -o all_structure --threads 16 --write_distances

kmds: control for population structure Detected binary phenotype Reading subsampled matrices from subsampled_matrices.txt Joined matrix 1 Distance matrix calculated in: 0.000732998 s

Intel MKL ERROR: Parameter 10 was incorrect on entry to DGEMM .

Intel MKL ERROR: Parameter 10 was incorrect on entry to DGEMM .

Intel MKL ERROR: Parameter 2 was incorrect on entry to DSYEVD.

Intel MKL ERROR: Parameter 2 was incorrect on entry to DSYEV. terminate called after throwing an instance of 'std::runtime_error' what(): Could not calculate eignvalues of B matrix in metric MDS Aborted (core dumped)

tseemann commented 8 years ago

I get the same error on the 1.1.1 binaries at the second kmds step:

kmds -p meta.pheno --mds_concat matrices.txt -o all_structure --threads 16 --write_distances
kmds: control for population structure
Detected continuous phenotype
Reading subsampled matrices from matrices.txt
Joined matrix 1
Distance matrix calculated in: 0.0711929 s

Intel MKL ERROR: Parameter 10 was incorrect on entry to DGEMM .

Intel MKL ERROR: Parameter 10 was incorrect on entry to DGEMM .

Intel MKL ERROR: Parameter 2 was incorrect on entry to DSYEVD.

Intel MKL ERROR: Parameter 2 was incorrect on entry to DSYEV.
terminate called after throwing an instance of 'std::runtime_error'
  what():  Could not calculate eignvalues of B matrix in metric MDS
Aborted (core dumped)
johnlees commented 8 years ago

I've seen this when either

Do you have an example of meta.pheno and one of the .dsm files in matrices.txt? Was the distance matrix written? (all_structure.distances.csv)

andersgs commented 8 years ago

Hi John.

I am having the same issue.

The distance matrix was written, but no idea if it is correct.

I'll send you some files offline.

Anders.

johnlees commented 8 years ago

@andersgs Your distance matrix looks ok to me, and I have been able to project it into three dimensions (attached below) all_structure.zip

Are you using the dynamically or statically compiled versions in the v1.1.1 release, or have you compiled from source yourself?

andersgs commented 8 years ago

I am using the statically compiled version (v1.1.1).

While that file is produced, none of the putative others needed to run seer are (*.sample?).

johnlees commented 8 years ago

I wonder if this might be an error with the way I have linked the math libraries in that case. Could you try this version: https://github.com/johnlees/seer/releases/download/v1.1.1/seer_v1.1.1_static_all.tar.gz

and see if it works?

andersgs commented 8 years ago

Thanks, John.

Still no go, unfortunately.

It seg faults. Even with just --help.

johnlees commented 8 years ago

What OS and version are you using?

I'm going to make an alternative script (requiring R) which will be able to make the required files from the distance matrix as a work-around for now

johnlees commented 8 years ago

The script should deal with this for practical purposes (see commit e817cee2d896eac88c715533100970243ee8a917) but ideally this should compile properly, so any more information on your platforms would be appreciated.

andersgs commented 8 years ago

Thank you for the script, John. I'll give it a go.

I am running RHEL7 (Red Hat).

mgalardini commented 8 years ago

Don't know if this is related, but I believe that the 'static_all' kmds segfaults even when exiting successfully, or when calling it with '-h'

johnlees commented 8 years ago

As far as I can tell the statically compiled versions won't work in RHEL (kmds at least). An alternative would be to use the sanger-pathogens VM (import ftp://ftp.sanger.ac.uk/pub/pathogens/pathogens-vm/pathogens-vm.latest.ova as a resource in virtualbox)

mgalardini commented 8 years ago

I'm in the process of trying the proposed pipeline on RHEL7; up to the first pass of kmds it seems to be working fine (if you ignore the segfault when the program exits). Will let you know if I can get the second pass to work. A virtual machine could be cool, but if it can't be run in a cluster it would still crash pretty soon when it starts requiring large amounts of RAM. Is there anything I (we?) can do to provide more info on this issue and related ones?

mgalardini commented 8 years ago

Hi, I can confirm that the second pass of kmds segfaults on RHEL6 (but same thing happens with RHEL7).

Output: kmds: control for population structure Detected binary phenotype Reading subsampled matrices from subsampled_matrices.txt Joined matrix 1 Joined matrix 2 Joined matrix 3 Joined matrix 4 Joined matrix 5 Joined matrix 6 Joined matrix 7 Joined matrix 8 Joined matrix 9 Joined matrix 10 Joined matrix 11 Joined matrix 12 Joined matrix 13 Joined matrix 14 Joined matrix 15 Joined matrix 16 /ebi/lsf/ebi-spool/02/1464773096.2749204: line 8: 37456 Segmentation fault (core dumped)

johnlees commented 8 years ago

I'm afraid that without a RHEL system myself to test on, and no knowledge of cross-compiling I am unlikely to be able to produce a pre-compiled version to work on your system.

If you are unable to use the VM I would suggest compiling from source. I am happy to try and help with any issues you have with this.

Alternatively, you could use mash to produce a k-mer based distance matrix which can then be used as input to the R script referenced above. This would avoid the high RAM usage step

mgalardini commented 8 years ago

Thank you John, I'll give it a shoot with mash and let you know

mgalardini commented 8 years ago

Hi John,

just to let you know that mash worked just fine in generating the distance matrix and that now seer is running happily. In case someone else gets into a similar situation, here's the commands used:

# generate a sketch for each genome
for infile in $(ls genome);
do
  mash sketch genome/$infile -o $(basename $i .fasta);
done
# run pairwise distance calculations
for genome in $(find . -maxdepth 1 -type f -name '*.msh');
do
  mash dist $genome *.msh > $(basename $genome .msh).dist;
done
cat *.dist > distances.txt
# ad-hoc script to convert mash output to square matrix
# no column and row names allowed
# row/columns are sorted alphabetically
# cells are comma separated
./mash2mat distances.txt > distances.csv
# project distance matrix using the script provided by seer
perl R_mds.pl -d distances.csv -p phenotypes.txt -o projection
johnlees commented 8 years ago

@mgalardini - happy to hear that! Thanks for the commands, I've added them to the wiki along with a mash2distances script in commit ac735f1

tseemann commented 8 years ago

@johnlees can you try it on a Centos VM ? either on VirtualBox or maybe @andrewjpage can advise on a Docker or real VM option available at Sanger?