gstecher / CloneFinderAPI

API for the Clone Finder application
MIT License
2 stars 7 forks source link

Trouble running CloneFinder #2

Open Rdroit opened 4 years ago

Rdroit commented 4 years ago

Hi,

I am actually trying to run CloneFinder on a machine with Ubuntu(x64) and 32 Gb of ram. I installed every dependencies described in the Readme and the ones that are not in it, ete2 and scikit-learn.

command used: python clonefinder.py snv ../path/to/file.txt

error:

parsing ancestral states file... successfully parsed ancestral states file: 9 taxa finding alignment with least parallel and back mutations... processing file: /tmp/branchdec_mega_alignment_ancestral_states.txt removing redundant seqs... bad positions [] Exception OSError: (2, 'No such file or directory', '/tmp/branchdec_mega_alignment.meg') in <bound method MegaMP.del of <parsimony.MegaMP.MegaMP object at 0x7f5f1813f390>> ignored generate output constructing MP tree executing megacc parsimony tree construction in /tmp/ MEGA-CC 10.1.6 Molecular Evolutionary Genetics Analysis Build#: 10191127-x86_64 0% Organizing sequence information
0% 03/12/2019 16:46:36 Using the following analysis options: No. of Taxa 5 Analysis Phylogeny Reconstruction Statistical Method Maximum Parsimony Test of Phylogeny None No. of Bootstrap Replications Not Applicable Substitutions Type Nucleotide Gaps/Missing Data Treatment Complete deletion Site Coverage Cutoff (%) Not Applicable MP Search Method Subtree-Pruning-Regrafting (SPR) No. of Initial Trees (random addition) 10 MP Search level 1 Max No. of Trees to Retain 100 Calculate Branch Lengths Yes Has Time Limit False Maximum Execution Time -1 datatype snNucleotide containsCodingNuc False MissingBaseSymbol ? IdenticalBaseSymbol . GapSymbol - Start time: 03/12/2019 16:46:36 Executing analysis:

   100% Analysis Complete                                                               

MEGA has completed the requested action Terminating the megacc process with exit code 0 MP tree(s): ((hg19:1443.00000000,Clone1:0.00000000):1709.00000000,(Clone2:283.00000000,Clone3:351.00000000):4.00000000,Clone4:2.00000000);

parsing ancestral states file... successfully parsed ancestral states file: 9 taxa finding alignment with least parallel and back mutations... processing file: /tmp/mega_alignment_ancestral_states.txt removing redundant seqs... best alignment test clone hit and remove insignificant clones 2019-12-03 16:46:37.251213 0:01:05.334527 Exception OSError: (2, 'No such file or directory', '/tmp/mega_alignment.meg') in <bound method MegaMP.del of <parsimony.MegaMP.MegaMP object at 0x7f5f18b34610>> ignored

It runs for 2 minutes and then crash with this error. It does the same with the example file. Do you have any idea why this is happening ?

Thank you for your time :)

SayakaMiura commented 4 years ago

Can you check if there are output files in the directory of your input file? I noticed CloneFinder produces that error message at the end, although it completes the computation.

Rdroit commented 4 years ago

Hi, thx for your answer, i do have output files but they are almoste empty:

file_snv_CloneFinder.meg: '''

MEGA

!Title SNVs; !Format datatype=dna;

Clone1

TAAAAAAAAATTAAAAAAAAAAAATTTTTAATTTTTTTTAAAAAATTTTATAAATATAAAATATAAAAAAAAA

Clone3

TTTTTATTTATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

Clone2

TTAAATAATTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

file_ snv_CloneFinder.txt: Tumor Clone3 Clone2 Clone1 Clone4 T-1 0 0.1497800439226611 0.8492562820467912 0.07237408224890221 T-2 0 0.20968852098682558 0.882350601900389 0 T-3 0.07197592918684449 0.15276037188636157 0.7331900285938274 0.06776708863438648 T-4 0 0.15295051715977218 0.8496975508594271 0.0672742872308919

file_snv_summary.txt: input data file: ../../342/342_CloneFinder_inputsnv.txt total read count cutoff: 50 mutant read count cutoff: 5 clone frequency cutoff: 0.05

hybrid sample genotypes were suggested: ['1', '2', '4', '3'] Hybrid sample genotype was decomposed hybrid sample genotypes were suggested: ['1', '2', '4', '3'] decomposed clone genotypes are not good (create more backward/parallel mutations), so clones were not decomposed. 100.0% of SNV sites are not affected by backward/parallel mutations. Run time: 0:01:05.334527

SayakaMiura commented 4 years ago

The files you got are the output files. The first one (file_snvCloneFinder.meg) is predicted clone sequences. "A" indicates wild type and "T" is for mutant base. The order is the same as your input file. The second file (file snv_CloneFinder.txt) is clone frequency table for each predicted clone within your samples. You can find the same clone IDs as in the file_snv_CloneFinder.meg.

Rdroit commented 4 years ago

ok, thank you for the answer. I though there was a problem. Is it possible to obtain the list of genes in each clone cluster ?

SayakaMiura commented 4 years ago

No, CloneFinder cannot. But, the order of SNVs in the output file (e.g., file_snv_CloneFinder.meg) is the same as your input file, so you can manually obtain the gene list for each clone. Please note that CloneFinder does not output mutation clusters (mutations at each branch of phylogeny). If you like to map mutated genes along a clone phylogeny, please first infer the time (branch) of mutation occurrence, which can be done easily by using the MEGA software (https://www.megasoftware.net/), i.e., the function of "Ancestors".

Rdroit commented 4 years ago

Ok, thank you for your help, last question, i use 4 time points in the input file, is it enough ? or should i use more time points ?

SayakaMiura commented 4 years ago

If you have more time points, please use all. More time points are better. But, please note that if most of the clones are shared by all (or many of) the samples (time points), CloneFinder will not perform well.