rgcgithub / clamms

CLAMMS is a scalable tool for detecting common and rare copy number variants from whole-exome sequencing data.
Other
29 stars 10 forks source link

SVD Software no longer available #25

Open ricardoharripaul opened 3 years ago

ricardoharripaul commented 3 years ago

I was trying to run the CLAMMS software and I noticed the SVD software is not available from MIT anymore. Are there alternative methods to use or is there an alternative download link?

Thanks!

tpjones15 commented 3 years ago

https://github.com/lucasmaystre/svdlibc

ricardoharripaul commented 3 years ago

Thanks for the link.
The link works, however, I followed the instructions for creating the matrix for the SVD software creates an error.

$SVD -d 4 -o svd-output -r dt matrix.txt Loading the matrix... ERROR: svdLoadDenseTextFile: bad file format ERROR: failed to read sparse matrix. Did you specify the correct file type with the -r argument?

Changing the format to sparse text default seemed to run but I do not know if the output is correct. The pca.coordinates.txt file seems to contain only 0 for all four columns.

My pca coordinate file seems incomplete:

$SVD -d 4 -o svd-output -r st matrix.txt Loading the matrix... Computing the SVD... SOLVING THE [A^TA] EIGENPROBLEM NO. OF ROWS = 415 NO. OF COLUMNS = 4333 NO. OF NON-ZERO VALUES = 0 MATRIX DENSITY = 0.00% MAX. NO. OF LANCZOS STEPS = 415 MAX. NO. OF EIGENPAIRS = 4 LEFT END OF THE INTERVAL = -1.00E-30 RIGHT END OF THE INTERVAL = 1.00E-30 KAPPA = 1.00E-06

TRANSPOSING THE MATRIX FOR SPEED NUMBER OF LANCZOS STEPS = 1 RITZ VALUES STABILIZED = 21 SINGULAR VALUES FOUND = 0

ELAPSED CPU TIME = 0 sec. MULTIPLICATIONS BY A = 3 MULTIPLICATIONS BY A^T = 3

head pca.coordinates.txt 753 0 0 0 0 753_recaled 0 0 0 0 ANMR22-7-IAU25-2_S1 0 0 0 0 ANMR48-1_S1 0 0 0 0 ARSID-M-10-5 0 0 0 0 ARSID-M-11-3 0 0 0 0

Thanks!

samreenzafer commented 3 years ago

Did you figure this out? I tried "-r st" but still get the same error. My matrix file is all zeros. Is that expected?

./svd -d 4 -o svd-output -r st matrix.txt Loading the matrix... ERROR: svdLoadSparseTextFile: bad file format ERROR: failed to read sparse matrix. Did you specify the correct file type with the -r argument?

ricardoharripaul commented 3 years ago

I don't remember resolving this issue.

On Mon, Oct 18, 2021 at 5:51 PM samreenzafer @.***> wrote:

Did you figure this out? I tried "-r st" but still get the same error. My matrix file is all zeros. Is that expected?

./svd -d 4 -o svd-output -r st matrix.txt Loading the matrix... ERROR: svdLoadSparseTextFile: bad file format ERROR: failed to read sparse matrix. Did you specify the correct file type with the -r argument?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rgcgithub/clamms/issues/25#issuecomment-946195522, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZPOI4C4PNC44WQJLMWLNTUHSJGDANCNFSM4VP26UVA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

tpjones15 commented 3 years ago

Hi all, firstly, apologies Ricardo, I must have missed your original reply.

This was a while ago for me too.. I believe it's an error with the example code when making the matrix file.

Where it says: awk '$1 != "X" && $1 != "Y" && $NF == 0 { print $4 }' $FILE \

The && $NF == 0 means that the 4th field is filtered for whenever it == 0 (why this is the case, I don't know). So you just get all the 4th column 0's printed from the normalised coverage file into the matrix. I just removed this part of the line..

ls *.norm.cov.bed | while read FILE do awk '$1 != "X" && $1 != "Y" { print $4 }' $FILE \ | gawk -f $CLAMMS_DIR/transpose.gawk >>matrix.txt done should work

Or, as the most recent issue suggests, change it to $NF != 0 (I'm not sure which would be most appropriate) - I would imagine removing it entirely, as otherwise you might get a discrepancy between the number of values from each file