Closed aclum closed 2 weeks ago
Has this bug only been noticed for this specific run? It doesn't seem to be generating the input files correctly
Array[File] hqmq_bin_tarfiles = flatten([glob("*_HQ.tar.gz"), glob("*_MQ.tar.gz")])
is how the variable is defined, but none of the tar.gz files exist, it does not seem to be zipping the folders correctly. Has there been an update recently to image or script that makes it so that create_tarfiles.py
doesn't run anymore?
Also, this error is present in task package
Traceback (most recent call last):
File "/opt/conda/envs/mags_vis/bin/ko_mapper.py", line 623, in <module>
main()
File "/opt/conda/envs/mags_vis/bin/ko_mapper.py", line 618, in main
metabolism_matrix_dropped_relabel, module_colors = create_output_files(metabolic_annotation, metabolism_matrix, module_information, cluster, prefix)
File "/opt/conda/envs/mags_vis/bin/ko_mapper.py", line 568, in create_output_files
cbar_kws= {'orientation': 'horizontal', 'label': 'Module Completeness (%)'}, dendrogram_ratio=0.1)
File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/seaborn/matrix.py", line 1262, in clustermap
tree_kws=tree_kws, **kwargs)
File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/seaborn/matrix.py", line 1142, in plot
self.plot_matrix(colorbar_kws, xind, yind, **kws)
File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/seaborn/matrix.py", line 1095, in plot_matrix
xticklabels=xtl, yticklabels=ytl, annot=annot, **kws)
File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/seaborn/matrix.py", line 448, in heatmap
yticklabels, mask)
File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/seaborn/matrix.py", line 164, in __init__
cmap, center, robust)
File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/seaborn/matrix.py", line 202, in _determine_cmap_params
vmin = np.nanmin(calc_data)
File "<__array_function__ internals>", line 6, in nanmin
File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/numpy/lib/nanfunctions.py", line 319, in nanmin
res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
ValueError: zero-size array to reduction operation fmin which has no identity
@chienchi's tests from April have valid tars /global/cfs/cdirs/m3408/aim2/metagenome/MAGs/output2
Appears to be active. Moving to new sprint. @aclum @chienchi FYI
This commit in April 1st changed the image that the package script uses. https://github.com/microbiomedata/metaMAGs/commit/0d756dce59269f423821ab2aec61ba34131b6ca2 the test run is from April 15 but that used GOLD style identifiers. I believe the issue is that the nmdc identifiers aren't being parsed correctly by create_tarfiles.py If you look at an example directory it doesn't properly generate subset any of the annotation files to correspond to data just belonging to that bin /pscratch/sd/n/nmdcda/cromwell-executions/nmdc_mags/29628c3e-8850-4210-927a-1d4258fa35d1/call-package/execution/nmdc_wfmag-12-fxwdrv82.1_bins.9_LQ> ls -ltr total 360 -rw-r--r-- 1 nmdcda nmdcda 0 Jul 1 12:50 nmdc_wfmag-12-fxwdrv82.1_bins.9.gff -rw-r--r-- 1 nmdcda nmdcda 368065 Jul 1 12:50 nmdc_wfmag-12-fxwdrv82.1_bins.9.fna -rw-r--r-- 1 nmdcda nmdcda 0 Jul 1 12:50 nmdc_wfmag-12-fxwdrv82.1_bins.9.faa -rw-r--r-- 1 nmdcda nmdcda 0 Jul 1 12:50 nmdc_wfmag-12-fxwdrv82.1_bins.9.ec.txt -rw-r--r-- 1 nmdcda nmdcda 0 Jul 1 12:50 nmdc_wfmag-12-fxwdrv82.1_bins.9.ko.txt -rw-r--r-- 1 nmdcda nmdcda 0 Jul 1 12:51 nmdc_wfmag-12-fxwdrv82.1_bins.9.gene_product.txt
The annotations result protein ID has been updated such that the metaMAGs workflow cannot find matching config ID to annotation result. We will need a mapping file from annotation workflow as one of input and use the renamed config fasta from annotations workflow as input fasta instead from assembly workflow.
Is there a way to have craete_tarfiles.py fail if the mapping isn't correct? Shane caught this because it was a test, otherwise the cromwell completed successfully which is risky to run in production.
@chienchi please see my comment from last week.
The create_tarfiles.py is performed after binning. Ideally, the mapping between input config ID and annotation ID should be checked after files staged.
Appears to have a PR. Will move to next sprint for review.
Shane noticed that there were no bins from a run where we expected them. In debugging the run, /pscratch/sd/n/nmdcda/cromwell-executions/nmdc_mags/29628c3e-8850-4210-927a-1d4258fa35d1/, it appears the root cause is in call-package, a test with GOLD identifiers worked on April 15, after the lastest change to the image version on April 1st so I suspect the issue is with nmdc style headers.