Open jcerca opened 3 years ago
Hi! You may want to check the results of intermediate output directories directly after the clustering to see if clustering picked up a signal at first? Sometimes the final iterations for differential kmers can fail. You may also want to look into the dimensionality reduction and clustering options, maybe try gaussian mixture models? I'd suspect that may be in your settings or data rather than the program, the clustering algorithm should be able to pick up a signal, at minimum.
I was not able to access the uploaded file.
Dear Joshua,
thank you for your time in advance. We managed to get polycracker running using a cluster with more suited memory and space. We obtained the following results:
The SpectralClustering shows the following: https://github.com/jcerca/jcerca.github.io/bl0ob/master/files/SpectralClusteringmain_tsne_2_n3ClusterTest.html [please download it]
But..
ls -lah
on the extracted subgenomes folder provides: -rw-r--r-- 1 josepc posixgroup 3.0G Dec 20 12:40 ambiguousScaffolds.fasta -rw-r--r-- 1 josepc posixgroup 3.1G Dec 20 12:41 ambiguousScaffolds_wrapped.fasta -rw-r--r-- 1 josepc posixgroup 0 Dec 20 12:40 scalesia_atractyloides.chrOnly_split.subgenomeA.fasta -rw-r--r-- 1 josepc posixgroup 0 Dec 20 12:40 scalesia_atractyloides.chrOnly_split.subgenomeA_wrapped.fasta -rw-r--r-- 1 josepc posixgroup 0 Dec 20 12:40 scalesia_atractyloides.chrOnly_split.subgenomeB.fasta -rw-r--r-- 1 josepc posixgroup 0 Dec 20 12:40 scalesia_atractyloides.chrOnly_split.subgenomeB_wrapped.fastaThis suggests that the program wasn't able to tease apart the subgenomes.
Is there any option/-flag you'd recommend running in order to try to disentangle both genomes? This is a 3.2 Gb genome with tons of repeats (>70% of the genome). It's part of the Asteraceae family, which is known to have tons of movement across chromosomes - previous Asteraceae genomes have found a really high number of chrmosome fusions and fissions - and repeats (60-80% of the genomes are repeats).
Thanks in advance. Below I copy the "nohup.out" file - I think it shows the analysis finalized properly.
Picked up _JAVA_OPTIONS: -Xms3G -Xmx5G Picked up _JAVA_OPTIONS: -Xms3G -Xmx5G N E X T F L O W ~ version 19.04.1 Launching
polycracker.nf
[furious_volta] - revision: 81c19416ea ./blast_files/ ./kmercount_files/ ./fasta_files/ ./bed_files/ 1300 40 scalesia_atractyloides.chrOnly.fasta 1 2 3 50000 0 26 13 linear 20 0 cosine 30 20 10,2 50000 1 0 0 0 1 0 0 2000000 1 0 1 1 3 tsne SpectralClustering 1 1 1 1 0 1 0 1 1 1 1 1 [warm up] executor > local executor > local (1) [61/d4591f] process > splitFastaProcess [ 0%] 0 of 1executor > local (1) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔
executor > local (2) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [ 0%] 0 of 1
executor > local (2) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ /workdir/polycracker polycracker writeKmerCount --fasta_path=./fasta_files/ --kmercount_path=./kmercount_files/ --kmer_length=26 --blast_mem=1300 export _JAVA_OPTIONS='-Xmx1300G' scalesia_atractyloides.chrOnly_split.fa
executor > local (3) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [ 0%] 0 of 1 /workdir/polycracker polycracker writeKmerCount --fasta_path=./fasta_files/ --kmercount_path=./kmercount_files/ --kmer_length=26 --blast_mem=1300 export _JAVA_OPTIONS='-Xmx1300G' scalesia_atractyloides.chrOnly_split.fa
executor > local (3) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔
executor > local (4) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [ 0%] 0 of 1
executor > local (5) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [ 0%] 0 of 1
executor > local (5) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ /workdir/polycracker/work/f2/c8edcd72d88b6a66dd07de25de050e
executor > local (6) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [ 0%] 0 of 1
executor > local (7) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [ 0%] 0 of 1
executor > local (7) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ scalesia_atractyloides.chrOnly_split.kcount.fa
executor > local (8) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ [8e/7834db] process > transform [ 0%] 0 of 1 scalesia_atractyloides.chrOnly_split.kcount.fa
executor > local (8) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ [8e/7834db] process > transform [100%] 1 of 1 ✔ /workdir/polycracker/work/8e/7834dbf4fda15ed112a19d66de0213
executor > local (9) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ [8e/7834db] process > transform [100%] 1 of 1 ✔ [d9/cc3e8b] process > cluster [ 0%] 0 of 1 /workdir/polycracker/work/8e/7834dbf4fda15ed112a19d66de0213
executor > local (10) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ [8e/7834db] process > transform [100%] 1 of 1 ✔ [d9/cc3e8b] process > cluster [100%] 1 of 1 ✔ [3b/1d9484] process > subgenomeExtraction [ 0%] 0 of 1 SpectralClustering /workdir/polycracker/work/d9/cc3e8b25a9474d9420e3d04f1d2800
executor > local (10) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ [8e/7834db] process > transform [100%] 1 of 1 ✔ [d9/cc3e8b] process > cluster [100%] 1 of 1 ✔ [3b/1d9484] process > subgenomeExtraction [100%] 1 of 1 ✔ model_subgenome_1.fa model_subgenome_0.fa my input to kcount to dict is: <open file './analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files/model_subgenome_0.kcount', mode 'r' at 0x7ffff5a649c0> my input to kcount to dict is: <open file './analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files/model_subgenome_1.kcount', mode 'r' at 0x7ffff5a64660> creating ./analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files//model_subgenome_1.higher.kmers.fa creating ./analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files//model_subgenome_0.higher.kmers.fa blast files contains 0: model_subgenome_0.higher.kmers.sam 1: model_subgenome_1.higher.kmers.sam 2: ref FINAL ITERATION DEAD /workdir/polycracker/work/3b/1d9484cb126a8be27783c407491c3d
executor > local (10) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ [8e/7834db] process > transform [100%] 1 of 1 ✔ [d9/cc3e8b] process > cluster [100%] 1 of 1 ✔ [3b/1d9484] process > subgenomeExtraction [100%] 1 of 1 ✔ model_subgenome_1.fa model_subgenome_0.fa my input to kcount to dict is: <open file './analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files/model_subgenome_0.kcount', mode 'r' at 0x7ffff5a649c0> my input to kcount to dict is: <open file './analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files/model_subgenome_1.kcount', mode 'r' at 0x7ffff5a64660> creating ./analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files//model_subgenome_1.higher.kmers.fa creating ./analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files//model_subgenome_0.higher.kmers.fa blast files contains 0: model_subgenome_0.higher.kmers.sam 1: model_subgenome_1.higher.kmers.sam 2: ref FINAL ITERATION DEAD /workdir/polycracker/work/3b/1d9484cb126a8be27783c407491c3d
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info. Completed at: 20-Dec-2020 11:41:25 Duration : 12h 3m 53s CPU hours : 479.9 Succeeded : 10