jlevy44 / PolyCRACKER-Unofficial-Mirror

A robust method for the unsupervised partitioning of polyploid subgenomes by signatures of repetitive DNA evolution https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-5828-5
Other
3 stars 2 forks source link

cp: cannot stat 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/model_subgenome_*.fa': No such file or directory #11

Open kashiff007 opened 2 years ago

kashiff007 commented 2 years ago

Hi @jlevy44 , I ran polycracker test_pipeline with my genome (with same name algae.fa) in test_data/test_fasta_files folder. It executed all the sub-program but after that show following error:

[nawazk@login509-02-l polycracker]$ polycracker test_pipeline
Picked up _JAVA_OPTIONS: -Xms3G -Xmx5G
N E X T F L O W  ~  version 19.10.0
Launching `polycracker.nf` [jolly_joliot] - revision: 164e856ff9
./blast_files/
./kmercount_files/
./test_data/test_fasta_files/
./bed_files/
./sortedbed_files/
100
4
algae.fa
1
2
3
100000
1
26
13
cosine
30
0
cosine
60
20
10,2
50000
1
0
100
0
1
0
0
2000000
5
0
1
6
3
kpca
SpectralClustering
1
1
1
1
0
1
0
1
1
1
1
1
executor >  local (10)
[99/9ebdfa] process > splitFastaProcess (1)               [100%] 1 of 1 ✔
[18/35ebdc] process > writeKmerCount                      [100%] 1 of 1 ✔
[00/d86681] process > kmer2Fasta (1)                      [100%] 1 of 1 ✔
[3f/e9b983] process > createOrigDB (1)                    [100%] 1 of 1 ✔
[a1/ca937d] process > BlastOff (1)                        [100%] 1 of 1 ✔
[11/e81654] process > blast2bed (1)                       [100%] 1 of 1 ✔
[d3/f78446] process > genClusterMatrix_kmerPrevalence (1) [100%] 1 of 1 ✔
[76/08cda3] process > transform (1)                       [100%] 1 of 1 ✔
[8e/46dca8] process > cluster (1)                         [100%] 1 of 1 ✔
[f9/ec604d] process > subgenomeExtraction (1)             [100%] 1 of 1 ✔
algae_split.kcount.fa

/ibex/scratch/projects/c2141/User_kashif_nawaz/LTR_RT_cluster_analysis_for_subgenome_separation/PolyCracker/polycracker/work/76/08cda337b17a07d0b7c3cd64743caa

SpectralClustering
/ibex/scratch/projects/c2141/User_kashif_nawaz/LTR_RT_cluster_analysis_for_subgenome_separation/PolyCracker/polycracker/work/8e/46dca8a2027633dd58a621eccdb5b8

/ibex/scratch/projects/c2141/User_kashif_nawaz/LTR_RT_cluster_analysis_for_subgenome_separation/PolyCracker/polycracker/work/f9/ec604d5e77c2b15e8b18e43e872876

WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Completed at: 30-May-2022 00:59:40
Duration    : 12m 21s
CPU hours   : 0.5
Succeeded   : 10

Traceback (most recent call last):
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/bin/polycracker", line 8, in <module>
    sys.exit(polycracker())
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/polycracker/polycracker.py", line 4547, in final_stats
    ctx.invoke(convert_subgenome_output_to_pickle,input_dir=polycracker_bed, scaffolds_pickle='scaffolds_stats.p', output_pickle='scaffolds_stats.poly.labels.p')
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/polycracker/polycracker.py", line 1908, in convert_subgenome_output_to_pickle
    for file in os.listdir(input_dir):
OSError: [Errno 2] No such file or directory: 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/clusterResults/'
Traceback (most recent call last):
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/bin/polycracker", line 8, in <module>
    sys.exit(polycracker())
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/polycracker/polycracker.py", line 1898, in convert_subgenome_output_to_pickle
    scaffolds = pickle.load(open(scaffolds_pickle,'rb'))
IOError: [Errno 2] No such file or directory: 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/scaffolds_connect.p'
cp: cannot stat 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/model_subgenome_*.fa': No such file or directory
cp: cannot stat 'polycracker.stats.analysis.csv': No such file or directory
cp: cannot stat 'SpectralClusteringmain_tsne_2_n3ClusterTest.html': No such file or directory
awk: fatal: cannot open file `blasted_merged.bed' for reading (No such file or directory)
/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/polycracker/polycracker.py:1372: UserWarning:

genfromtxt: Empty input file: "<open file 'awk \'{print gsub(/,/,"")+1}\' blasted_merged.bed', mode 'r' at 0x7f848d88b810>"

/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/seaborn/distributions.py:198: RuntimeWarning:

Mean of empty slice.

/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/numpy/core/_methods.py:85: RuntimeWarning:

invalid value encountered in double_scalars

Traceback (most recent call last):
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/bin/polycracker", line 8, in <module>
    sys.exit(polycracker())
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/polycracker/polycracker.py", line 1266, in plotPositions
    labels = pickle.load(open(labels_pickle,'rb'))
IOError: [Errno 2] No such file or directory: 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/scaffolds_connect.p'
Please see results in ./test_results.

Original genome in ./test_data/test_fasta_files .

My genome size is ~500mb and config file is:

# polyCRACKER configuration file

# file paths
blastPath = ./blast_files/
kmercountPath = ./kmercount_files/
fastaPath = ./test_data/test_fasta_files/
bedPath = ./bed_files/
sortPath = ./sortedbed_files/
reclusterPath = ./recluster_files/
kmer500BestPath = ./kmercount_500Best_files/

# genome
genome = algae.fa

# scheduler system, do not change at the moment
slurm = 0
local = 1

# blast or bbtools // deprecated
BB = 1

# recommended practice, number of dimensions > number of subgenomes
n_subgenomes = 2
n_dimensions = 3

# split fasta into chunks
splitFasta = 1
preFilter = 0
splitFastaLineLength = 100000

# write kmer counts and convert to fasta file to be blasted
writeKmer = 1
kmerLength = 26
kmer2Fasta = 1
kmer_low_count = 60
use_high_count = 0
kmer_high_count = 2000000
sampling_sensitivity = 5

# use original genome for final analysis output?
original = 0

# blast and generate bed files, turning bed files into clustering matrix, specified memory usage options, and remove chunks
writeBlast = 1
k_search_length = 13
runBlastParallel = 0
blastMemory = 100
threads = 4
blast2bed = 1
generateClusteringMatrix = 1
lowMemory = 100
minChunkSize = 50000
removeNonChunk = 1
minChunkThreshold = 0
tfidf = 1
perfect_mode = 0

# transform and cluster the data
transformData = 1
reduction_techniques = kpca
transformMetric = cosine
ClusterAll = 1
clusterMethods = SpectralClustering
grabAllClusters = 1
n_neighbors = 30
metric = cosine
weighted_nn = 0
mst = 0

# extract the subgenomes
extract = 1
diff_kmer_threshold = 20
default_kmercount_value = 3
diff_sample_rate = 6
unionbed_threshold = 10,2
bootstrap = 1
kashiff007 commented 2 years ago

I figured it out. In my config file I mentioned reduction_techniques = spca but the default folder generating is with name "tsne". So either replace tsne to spca in the polycracker.py or change the config file.

Although, all the major errors are rectified but I again faced one: cp: cannot stat 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/model_subgenome_*.fa': No such file or directory

And this error occuring with algae genome also (default). This error leads to further deprecation of plots. Despite trying with different parameters I am unable to figure this out. Whole run looks like:

[nawazk@login509-02-l polycracker]$ polycracker test_pipeline
Picked up _JAVA_OPTIONS: -Xms3G -Xmx5G
N E X T F L O W  ~  version 19.10.0
Launching `polycracker.nf` [kickass_pesquet] - revision: 164e856ff9
./blast_files/
./kmercount_files/
./test_data/test_fasta_files/
./bed_files/
5
4
algae.fa
1
2
3
50000
0
26
13
linear
30
0
cosine
30
20
10,2
50000
1
0
0
0
1
0
0
2000000
1
0
1
1
3
tsne
SpectralClustering
1
1
1
1
0
1
0
1
1
1
1
1
executor >  local (10)
[51/84ffe2] process > splitFastaProcess (1)               [100%] 1 of 1 ✔
[a2/15d6ac] process > writeKmerCount                      [100%] 1 of 1 ✔
[7d/7df219] process > kmer2Fasta (1)                      [100%] 1 of 1 ✔
[d9/f8807b] process > createOrigDB (1)                    [100%] 1 of 1 ✔
[7b/be6bda] process > BlastOff (1)                        [100%] 1 of 1 ✔
[c6/e5b84b] process > blast2bed (1)                       [100%] 1 of 1 ✔
[de/800657] process > genClusterMatrix_kmerPrevalence (1) [100%] 1 of 1 ✔
[78/eddd66] process > transform (1)                       [100%] 1 of 1 ✔
[a6/31cadf] process > cluster (1)                         [100%] 1 of 1 ✔
[22/aadda8] process > subgenomeExtraction (1)             [100%] 1 of 1 ✔
algae_split.kcount.fa

/ibex/scratch/projects/c2141/User_kashif_nawaz/LTR_RT_cluster_analysis_for_subgenome_separation/PolyCracker/polycracker_sample/polycracker/work/78/eddd66f3b3fe3a81598123bca12f8d

SpectralClustering
/ibex/scratch/projects/c2141/User_kashif_nawaz/LTR_RT_cluster_analysis_for_subgenome_separation/PolyCracker/polycracker_sample/polycracker/work/a6/31cadf4e674e6e508828a71c34b511

/ibex/scratch/projects/c2141/User_kashif_nawaz/LTR_RT_cluster_analysis_for_subgenome_separation/PolyCracker/polycracker_sample/polycracker/work/22/aadda80d25aa9cd82bd7559d1177fe

WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Completed at: 30-May-2022 12:42:36
Duration    : 3m 25s
CPU hours   : 0.1
Succeeded   : 10

['subgenome_0' 'subgenome_1']
{'subgenome_1': 'Csubellipsoidea', 'subgenome_0': 'Creinhardtii'}
['subgenome_1' 'subgenome_1' 'subgenome_1' ... 'subgenome_0' 'subgenome_0'
 'ambiguous']
/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/sklearn/metrics/classification.py:1145: UndefinedMetricWarning:

Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples.

OrderedDict([('Length: Creinhardtii Original', [111100715]), ('Length: Creinhardtii Poly', [60150000]), ('Length: Creinhardtii Poly Correct', [13750000]), ('Length: Csubellipsoidea Original', [48952548]), ('Length: Csubellipsoidea Poly', [89800000]), ('Length: Csubellipsoidea Poly Correct', [1700000]), ('Length: Total Genome', [160053263]), ('Length: Total Poly', [149950000]), ('Length: Total Poly Correct', [15450000]), ('Metric: Classification Report Summary Avgs', [{u'ambiguous': {'recall': 0.0, 'f1-score': 0.0, 'support': 0.0, 'precision': 0.0}, u'Csubellipsoidea': {'recall': 0.03472750795321289, 'f1-score': 0.024504054512931108, 'support': 0.30585160891095947, 'precision': 0.018930957683742207}, 'micro avg': {'recall': 0.09653036564459475, 'f1-score': 0.09653036564459465, 'support': 0.9999999999999714, 'precision': 0.09653036564459454}, u'Creinhardtii': {'recall': 0.12376157975221386, 'f1-score': 0.16058327114138415, 'support': 0.6941483910890119, 'precision': 0.2285951787198736}, 'weighted avg': {'recall': 0.09653036564459475, 'f1-score': 0.11896322379622762, 'support': 0.9999999999999714, 'precision': 0.16446903938490795}, 'macro avg': {'recall': 0.052829695901808915, 'f1-score': 0.06169577521810509, 'support': 0.9999999999999714, 'precision': 0.0825087121345386}}]), ('Metric: FN', [46400000]), ('Metric: FP', [88100000]), ('Metric: Jaccard Similarity', [0.09475620975160993]), ('Metric: TN', [13750000]), ('Metric: TP', [1700000]), ('Ratio: [Creinhardtii Poly Correct]/[Creinhardtii Original]', [0.12376157975220951]), ('Ratio: [Creinhardtii Poly]/[Creinhardtii Original]', [0.5414006561523929]), ('Ratio: [Csubellipsoidea Poly Correct]/[Csubellipsoidea Original]', [0.03472750795321216]), ('Ratio: [Csubellipsoidea Poly]/[Csubellipsoidea Original]', [1.8344295377637954]), ('Ratio: [Total Poly Correct]/[Total Genomes]', [0.0965303656445917]), ('Ratio: [Total Poly]/[Total Genome]', [0.9368756199615874])])
cp: cannot stat 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/model_subgenome_*.fa': No such file or directory
['subgenome_1' 'subgenome_1' 'subgenome_1' ... 'subgenome_0' 'subgenome_0'
 'subgenome_1']
{'subgenome_1': 'hsl(120.0,50%,50%)', 'subgenome_0': 'hsl(0.0,50%,50%)'}
[('subgenome_1', 'hsl(120.0,50%,50%)'), ('subgenome_0', 'hsl(0.0,50%,50%)')]
/ibex/sw/csi/polycracker/1.0.3/el7.6_python2.7/polyCRACKER/lib/python2.7/site-packages/plotly/graph_objs/_deprecations.py:385: DeprecationWarning:

plotly.graph_objs.Line is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.scatter.Line
  - plotly.graph_objs.layout.shape.Line
  - etc.

Please see results in ./test_results.

Original genome in ./test_data/test_fasta_files .