jlevy44 / PolyCRACKER-Unofficial-Mirror

A robust method for the unsupervised partitioning of polyploid subgenomes by signatures of repetitive DNA evolution https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-5828-5
Other
3 stars 2 forks source link

Interpreting/modifying results #8

Open jcerca opened 3 years ago

jcerca commented 3 years ago

Dear Joshua,

thank you for your time in advance. We managed to get polycracker running using a cluster with more suited memory and space. We obtained the following results:

The SpectralClustering shows the following: https://github.com/jcerca/jcerca.github.io/bl0ob/master/files/SpectralClusteringmain_tsne_2_n3ClusterTest.html [please download it]

But..

  1. The "final results" folder is empty;
  2. Doing ls -lah on the extracted subgenomes folder provides: -rw-r--r-- 1 josepc posixgroup 3.0G Dec 20 12:40 ambiguousScaffolds.fasta -rw-r--r-- 1 josepc posixgroup 3.1G Dec 20 12:41 ambiguousScaffolds_wrapped.fasta -rw-r--r-- 1 josepc posixgroup 0 Dec 20 12:40 scalesia_atractyloides.chrOnly_split.subgenomeA.fasta -rw-r--r-- 1 josepc posixgroup 0 Dec 20 12:40 scalesia_atractyloides.chrOnly_split.subgenomeA_wrapped.fasta -rw-r--r-- 1 josepc posixgroup 0 Dec 20 12:40 scalesia_atractyloides.chrOnly_split.subgenomeB.fasta -rw-r--r-- 1 josepc posixgroup 0 Dec 20 12:40 scalesia_atractyloides.chrOnly_split.subgenomeB_wrapped.fasta

This suggests that the program wasn't able to tease apart the subgenomes.

Is there any option/-flag you'd recommend running in order to try to disentangle both genomes? This is a 3.2 Gb genome with tons of repeats (>70% of the genome). It's part of the Asteraceae family, which is known to have tons of movement across chromosomes - previous Asteraceae genomes have found a really high number of chrmosome fusions and fissions - and repeats (60-80% of the genomes are repeats).

Thanks in advance. Below I copy the "nohup.out" file - I think it shows the analysis finalized properly.

Picked up _JAVA_OPTIONS: -Xms3G -Xmx5G Picked up _JAVA_OPTIONS: -Xms3G -Xmx5G N E X T F L O W ~ version 19.04.1 Launching polycracker.nf [furious_volta] - revision: 81c19416ea ./blast_files/ ./kmercount_files/ ./fasta_files/ ./bed_files/ 1300 40 scalesia_atractyloides.chrOnly.fasta 1 2 3 50000 0 26 13 linear 20 0 cosine 30 20 10,2 50000 1 0 0 0 1 0 0 2000000 1 0 1 1 3 tsne SpectralClustering 1 1 1 1 0 1 0 1 1 1 1 1 [warm up] executor > local executor > local (1) [61/d4591f] process > splitFastaProcess [ 0%] 0 of 1

executor > local (1) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔

executor > local (2) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [ 0%] 0 of 1

executor > local (2) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ /workdir/polycracker polycracker writeKmerCount --fasta_path=./fasta_files/ --kmercount_path=./kmercount_files/ --kmer_length=26 --blast_mem=1300 export _JAVA_OPTIONS='-Xmx1300G' scalesia_atractyloides.chrOnly_split.fa

executor > local (3) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [ 0%] 0 of 1 /workdir/polycracker polycracker writeKmerCount --fasta_path=./fasta_files/ --kmercount_path=./kmercount_files/ --kmer_length=26 --blast_mem=1300 export _JAVA_OPTIONS='-Xmx1300G' scalesia_atractyloides.chrOnly_split.fa

executor > local (3) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔

executor > local (4) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [ 0%] 0 of 1

executor > local (5) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [ 0%] 0 of 1

executor > local (5) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ /workdir/polycracker/work/f2/c8edcd72d88b6a66dd07de25de050e

executor > local (6) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [ 0%] 0 of 1

executor > local (7) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [ 0%] 0 of 1

executor > local (7) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ scalesia_atractyloides.chrOnly_split.kcount.fa

executor > local (8) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ [8e/7834db] process > transform [ 0%] 0 of 1 scalesia_atractyloides.chrOnly_split.kcount.fa

executor > local (8) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ [8e/7834db] process > transform [100%] 1 of 1 ✔ /workdir/polycracker/work/8e/7834dbf4fda15ed112a19d66de0213

executor > local (9) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ [8e/7834db] process > transform [100%] 1 of 1 ✔ [d9/cc3e8b] process > cluster [ 0%] 0 of 1 /workdir/polycracker/work/8e/7834dbf4fda15ed112a19d66de0213

executor > local (10) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ [8e/7834db] process > transform [100%] 1 of 1 ✔ [d9/cc3e8b] process > cluster [100%] 1 of 1 ✔ [3b/1d9484] process > subgenomeExtraction [ 0%] 0 of 1 SpectralClustering /workdir/polycracker/work/d9/cc3e8b25a9474d9420e3d04f1d2800

executor > local (10) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ [8e/7834db] process > transform [100%] 1 of 1 ✔ [d9/cc3e8b] process > cluster [100%] 1 of 1 ✔ [3b/1d9484] process > subgenomeExtraction [100%] 1 of 1 ✔ model_subgenome_1.fa model_subgenome_0.fa my input to kcount to dict is: <open file './analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files/model_subgenome_0.kcount', mode 'r' at 0x7ffff5a649c0> my input to kcount to dict is: <open file './analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files/model_subgenome_1.kcount', mode 'r' at 0x7ffff5a64660> creating ./analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files//model_subgenome_1.higher.kmers.fa creating ./analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files//model_subgenome_0.higher.kmers.fa blast files contains 0: model_subgenome_0.higher.kmers.sam 1: model_subgenome_1.higher.kmers.sam 2: ref FINAL ITERATION DEAD /workdir/polycracker/work/3b/1d9484cb126a8be27783c407491c3d

executor > local (10) [61/d4591f] process > splitFastaProcess [100%] 1 of 1 ✔ [92/0ea226] process > writeKmerCount [100%] 1 of 1 ✔ [7b/4a876c] process > kmer2Fasta [100%] 1 of 1 ✔ [d9/9e2641] process > createOrigDB [100%] 1 of 1 ✔ [f2/c8edcd] process > BlastOff [100%] 1 of 1 ✔ [13/73f77c] process > blast2bed [100%] 1 of 1 ✔ [13/7bd691] process > genClusterMatrix_kmerPrevalence [100%] 1 of 1 ✔ [8e/7834db] process > transform [100%] 1 of 1 ✔ [d9/cc3e8b] process > cluster [100%] 1 of 1 ✔ [3b/1d9484] process > subgenomeExtraction [100%] 1 of 1 ✔ model_subgenome_1.fa model_subgenome_0.fa my input to kcount to dict is: <open file './analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files/model_subgenome_0.kcount', mode 'r' at 0x7ffff5a649c0> my input to kcount to dict is: <open file './analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files/model_subgenome_1.kcount', mode 'r' at 0x7ffff5a64660> creating ./analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files//model_subgenome_1.higher.kmers.fa creating ./analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/kmercount_files//model_subgenome_0.higher.kmers.fa blast files contains 0: model_subgenome_0.higher.kmers.sam 1: model_subgenome_1.higher.kmers.sam 2: ref FINAL ITERATION DEAD /workdir/polycracker/work/3b/1d9484cb126a8be27783c407491c3d

WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info. Completed at: 20-Dec-2020 11:41:25 Duration : 12h 3m 53s CPU hours : 479.9 Succeeded : 10

jlevy44 commented 3 years ago

Hi! You may want to check the results of intermediate output directories directly after the clustering to see if clustering picked up a signal at first? Sometimes the final iterations for differential kmers can fail. You may also want to look into the dimensionality reduction and clustering options, maybe try gaussian mixture models? I'd suspect that may be in your settings or data rather than the program, the clustering algorithm should be able to pick up a signal, at minimum.

I was not able to access the uploaded file.