Open kashiff007 opened 2 years ago
I figured it out. In my config file I mentioned reduction_techniques = spca but the default folder generating is with name "tsne". So either replace tsne to spca in the polycracker.py or change the config file.
Although, all the major errors are rectified but I again faced one:
cp: cannot stat 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/model_subgenome_*.fa': No such file or directory
And this error occuring with algae genome also (default). This error leads to further deprecation of plots. Despite trying with different parameters I am unable to figure this out. Whole run looks like:
[nawazk@login509-02-l polycracker]$ polycracker test_pipeline
Picked up _JAVA_OPTIONS: -Xms3G -Xmx5G
N E X T F L O W ~ version 19.10.0
Launching `polycracker.nf` [kickass_pesquet] - revision: 164e856ff9
./blast_files/
./kmercount_files/
./test_data/test_fasta_files/
./bed_files/
5
4
algae.fa
1
2
3
50000
0
26
13
linear
30
0
cosine
30
20
10,2
50000
1
0
0
0
1
0
0
2000000
1
0
1
1
3
tsne
SpectralClustering
1
1
1
1
0
1
0
1
1
1
1
1
executor > local (10)
[51/84ffe2] process > splitFastaProcess (1) [100%] 1 of 1 ✔
[a2/15d6ac] process > writeKmerCount [100%] 1 of 1 ✔
[7d/7df219] process > kmer2Fasta (1) [100%] 1 of 1 ✔
[d9/f8807b] process > createOrigDB (1) [100%] 1 of 1 ✔
[7b/be6bda] process > BlastOff (1) [100%] 1 of 1 ✔
[c6/e5b84b] process > blast2bed (1) [100%] 1 of 1 ✔
[de/800657] process > genClusterMatrix_kmerPrevalence (1) [100%] 1 of 1 ✔
[78/eddd66] process > transform (1) [100%] 1 of 1 ✔
[a6/31cadf] process > cluster (1) [100%] 1 of 1 ✔
[22/aadda8] process > subgenomeExtraction (1) [100%] 1 of 1 ✔
algae_split.kcount.fa
/ibex/scratch/projects/c2141/User_kashif_nawaz/LTR_RT_cluster_analysis_for_subgenome_separation/PolyCracker/polycracker_sample/polycracker/work/78/eddd66f3b3fe3a81598123bca12f8d
SpectralClustering
/ibex/scratch/projects/c2141/User_kashif_nawaz/LTR_RT_cluster_analysis_for_subgenome_separation/PolyCracker/polycracker_sample/polycracker/work/a6/31cadf4e674e6e508828a71c34b511
/ibex/scratch/projects/c2141/User_kashif_nawaz/LTR_RT_cluster_analysis_for_subgenome_separation/PolyCracker/polycracker_sample/polycracker/work/22/aadda80d25aa9cd82bd7559d1177fe
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Completed at: 30-May-2022 12:42:36
Duration : 3m 25s
CPU hours : 0.1
Succeeded : 10
['subgenome_0' 'subgenome_1']
{'subgenome_1': 'Csubellipsoidea', 'subgenome_0': 'Creinhardtii'}
['subgenome_1' 'subgenome_1' 'subgenome_1' ... 'subgenome_0' 'subgenome_0'
'ambiguous']
/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/sklearn/metrics/classification.py:1145: UndefinedMetricWarning:
Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples.
OrderedDict([('Length: Creinhardtii Original', [111100715]), ('Length: Creinhardtii Poly', [60150000]), ('Length: Creinhardtii Poly Correct', [13750000]), ('Length: Csubellipsoidea Original', [48952548]), ('Length: Csubellipsoidea Poly', [89800000]), ('Length: Csubellipsoidea Poly Correct', [1700000]), ('Length: Total Genome', [160053263]), ('Length: Total Poly', [149950000]), ('Length: Total Poly Correct', [15450000]), ('Metric: Classification Report Summary Avgs', [{u'ambiguous': {'recall': 0.0, 'f1-score': 0.0, 'support': 0.0, 'precision': 0.0}, u'Csubellipsoidea': {'recall': 0.03472750795321289, 'f1-score': 0.024504054512931108, 'support': 0.30585160891095947, 'precision': 0.018930957683742207}, 'micro avg': {'recall': 0.09653036564459475, 'f1-score': 0.09653036564459465, 'support': 0.9999999999999714, 'precision': 0.09653036564459454}, u'Creinhardtii': {'recall': 0.12376157975221386, 'f1-score': 0.16058327114138415, 'support': 0.6941483910890119, 'precision': 0.2285951787198736}, 'weighted avg': {'recall': 0.09653036564459475, 'f1-score': 0.11896322379622762, 'support': 0.9999999999999714, 'precision': 0.16446903938490795}, 'macro avg': {'recall': 0.052829695901808915, 'f1-score': 0.06169577521810509, 'support': 0.9999999999999714, 'precision': 0.0825087121345386}}]), ('Metric: FN', [46400000]), ('Metric: FP', [88100000]), ('Metric: Jaccard Similarity', [0.09475620975160993]), ('Metric: TN', [13750000]), ('Metric: TP', [1700000]), ('Ratio: [Creinhardtii Poly Correct]/[Creinhardtii Original]', [0.12376157975220951]), ('Ratio: [Creinhardtii Poly]/[Creinhardtii Original]', [0.5414006561523929]), ('Ratio: [Csubellipsoidea Poly Correct]/[Csubellipsoidea Original]', [0.03472750795321216]), ('Ratio: [Csubellipsoidea Poly]/[Csubellipsoidea Original]', [1.8344295377637954]), ('Ratio: [Total Poly Correct]/[Total Genomes]', [0.0965303656445917]), ('Ratio: [Total Poly]/[Total Genome]', [0.9368756199615874])])
cp: cannot stat 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/model_subgenome_*.fa': No such file or directory
['subgenome_1' 'subgenome_1' 'subgenome_1' ... 'subgenome_0' 'subgenome_0'
'subgenome_1']
{'subgenome_1': 'hsl(120.0,50%,50%)', 'subgenome_0': 'hsl(0.0,50%,50%)'}
[('subgenome_1', 'hsl(120.0,50%,50%)'), ('subgenome_0', 'hsl(0.0,50%,50%)')]
/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/plotly/graph_objs/_deprecations.py:385: DeprecationWarning:
plotly.graph_objs.Line is deprecated.
Please replace it with one of the following more specific types
- plotly.graph_objs.scatter.Line
- plotly.graph_objs.layout.shape.Line
- etc.
Please see results in ./test_results.
Original genome in ./test_data/test_fasta_files .
Hi @jlevy44 , I ran polycracker test_pipeline with my genome (with same name algae.fa) in test_data/test_fasta_files folder. It executed all the sub-program but after that show following error:
My genome size is ~500mb and config file is: