phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
83 stars 19 forks source link

make_tcr_clumping_plot failing when trying to save figure #41

Open AvivBenchorin opened 2 years ago

AvivBenchorin commented 2 years ago

When running the reanalyze step described in the README (running run_conga.py with --restart and --all), the run fails with the error ValueError: Image size of 4124x92909 pixels is too large. It must be less than 2^16 in each direction. while inside make_tcr_clumping_plots.

The error happens when plt.savefig(logo_pngfile, dpi=300) is called in make_logo_plots (line 1450 in conga/plotting.py), which is called by make_cluster_logo_plots_figure (line 3130) and in turn is called in make_tcr_clumping_plots (line 3189). It seems like the SVG to PNG conversion using imagemagick was successful for the individual logos plots, and the error is happening when merging the images together. Some quick Googling suggests that the issue may might be related to plt.text calls in make_logo_plots, but at the moment I am unsure of any specific causes.

The commands I ran before the reanalyze step were:

python /path/to/conga/scripts/setup_10x_for_conga.py --filtered_contig_annotations_csvfile filtered_contig_annotations.csv --organism human --no_kpca
python /path/to/conga/scripts/run_conga.py --graph_vs_graph --no_kpca --gex_data filtered_feature_bc_matrix.h5  --gex_data_type 10x_h5 --clones_file filtered_contig_annotations_tcrdist_clones.tsv --organism human --outfile_prefix CoNGA_out

The specific command I ran for the reanalyze step was: python /path/to/conga/scripts/run_conga.py --restart CoNGA_out_final.h5ad --all --no_kpca --outfile_prefix CoNGA_out_restarted

The CoNGA run is being run on the gene expression and TCR data of 300k cells, using 10X cellranger outputs (the very large size of the data could be a contributing factor to the issue). I am using the most recent version of CoNGA (as of March 4, 2022, the most recent commit was December 10, 2021), and running CoNGA inside a conda python environment that was created using the instructions provided in the README.

My conda environment is the following:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             4.5                       1_gnu  
argon2-cffi               20.1.0           py36h8f6f2f9_2    conda-forge
arpack                    3.7.0                hc6cf775_2    conda-forge
async_generator           1.10                       py_0    conda-forge
atk-1.0                   2.36.0               h3371d22_4    conda-forge
attrs                     21.4.0             pyhd8ed1ab_0    conda-forge
backcall                  0.2.0              pyhd3eb1b0_0  
blas                      1.0                         mkl  
bleach                    4.1.0              pyhd8ed1ab_0    conda-forge
blosc                     1.21.0               h8c45485_0  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2021.10.8            ha878542_0    conda-forge
cairo                     1.16.0            h18b612c_1001    conda-forge
certifi                   2021.5.30        py36h5fab9bb_0    conda-forge
cffi                      1.14.6           py36hc120d54_0    conda-forge
cycler                    0.11.0             pyhd3eb1b0_0  
dbus                      1.13.18              hb2f20db_0  
decorator                 4.4.2                    pypi_0    pypi
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
expat                     2.4.4                h295c915_0  
fastcluster               1.2.4                    pypi_0    pypi
fftw                      3.3.9                h27cfd23_1  
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.13.1               h6c09931_0  
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
freetype                  2.11.0               h70c0345_0  
fribidi                   1.0.10               h36c2ea0_0    conda-forge
gdk-pixbuf                2.42.6               h04a7f16_0    conda-forge
get-version               2.1                      pypi_0    pypi
gettext                   0.19.8.1          h0b5b191_1005    conda-forge
ghostscript               9.54.0               h9c3ff4c_1    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
glib                      2.68.3               h9c3ff4c_0    conda-forge
glib-tools                2.68.3               h9c3ff4c_0    conda-forge
glpk                      4.65              h9202a9a_1004    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
graphite2                 1.3.14               h23475e2_0  
graphviz                  2.48.0               h85b4f2f_0    conda-forge
gst-plugins-base          1.14.0               h8213a91_2  
gstreamer                 1.14.0               h28cd5cc_2  
gtk2                      2.24.33              h539f30e_1    conda-forge
gts                       0.7.6                h64030ff_2    conda-forge
harfbuzz                  2.8.1                h6f93f22_0  
hdf5                      1.10.4               hb1b8bf9_0  
icu                       58.2                 he6710b0_3  
igraph                    0.9.4                ha184e22_0    conda-forge
imagemagick               7.0.11_13       pl5320hb118871_0    conda-forge
intel-openmp              2022.0.1          h06a4308_3633  
ipykernel                 5.5.5            py36hcb3619a_0    conda-forge
ipython                   7.16.1           py36h5ca1d4c_0  
ipython_genutils          0.2.0              pyhd3eb1b0_1  
jbig                      2.1               h7f98852_2003    conda-forge
jedi                      0.17.0                   py36_0  
jinja2                    3.0.3              pyhd8ed1ab_0    conda-forge
joblib                    1.0.1              pyhd3eb1b0_0  
jpeg                      9d                   h7f8727e_0  
jsonschema                3.0.2                    py36_0    conda-forge
jupyter_client            7.1.2              pyhd8ed1ab_0    conda-forge
jupyter_core              4.8.1            py36h5fab9bb_0    conda-forge
jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
kiwisolver                1.3.1            py36h2531618_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.35.1               h7274673_9  
legacy-api-wrap           1.2                      pypi_0    pypi
leidenalg                 0.8.7            py36hc4f0c31_0    conda-forge
libblas                   3.9.0           1_h6e990d7_netlib    conda-forge
libcblas                  3.9.0           3_h893e4fe_netlib    conda-forge
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.3.0               h5101ec6_17  
libgd                     2.3.3                h695aa2c_0  
libgfortran-ng            7.5.0               ha8ba4b0_17  
libgfortran4              7.5.0               ha8ba4b0_17  
libglib                   2.68.3               h3e27bee_0    conda-forge
libgomp                   9.3.0               h5101ec6_17  
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0           3_h893e4fe_netlib    conda-forge
libllvm10                 10.0.1               hbcb73fb_5  
libpng                    1.6.37               hbc83047_0  
librsvg                   2.50.7               hc3c00ef_0    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libstdcxx-ng              9.3.0               hd4cf53a_17  
libtiff                   4.2.0                h85742a9_0  
libtool                   2.4.6             h58526e2_1007    conda-forge
libuuid                   1.0.3                h7f8727e_2  
libwebp                   1.2.2                h55f646e_0  
libwebp-base              1.2.2                h7f8727e_0  
libxcb                    1.14                 h7b6447c_0  
libxml2                   2.9.12               h03d6c58_0  
llvmlite                  0.36.0           py36h612dafd_4  
louvain                   0.7.0            py36hc4f0c31_0    conda-forge
lz4-c                     1.9.3                h295c915_1  
lzo                       2.10                 h7b6447c_2  
markupsafe                2.0.1            py36h8f6f2f9_0    conda-forge
matplotlib                3.3.4            py36h06a4308_0  
matplotlib-base           3.3.4            py36h62a2d02_0  
metis                     5.1.0             h58526e2_1006    conda-forge
mistune                   0.8.4           py36h8f6f2f9_1004    conda-forge
mkl                       2020.2                      256  
mkl-service               2.3.0            py36he8ac12f_0  
mkl_fft                   1.3.0            py36h54f3939_0  
mkl_random                1.1.1            py36h0573a6f_0  
mock                      4.0.3              pyhd3eb1b0_0  
mpfr                      4.1.0                h9202a9a_1    conda-forge
nbclient                  0.5.9              pyhd8ed1ab_0    conda-forge
nbconvert                 6.0.7            py36h5fab9bb_3    conda-forge
nbformat                  5.1.3              pyhd8ed1ab_0    conda-forge
ncurses                   6.3                  h7f8727e_2  
nest-asyncio              1.5.4              pyhd8ed1ab_0    conda-forge
networkx                  2.5.1                    pypi_0    pypi
notebook                  6.3.0            py36h5fab9bb_0    conda-forge
numba                     0.53.1           py36ha9443f7_0  
numexpr                   2.7.3            py36hb2eb853_0  
numpy                     1.19.2           py36h54aff64_0  
numpy-base                1.19.2           py36hfa32c7d_0  
olefile                   0.46                     py36_0  
openjpeg                  2.4.0                h3ad879b_0  
openssl                   1.1.1k               h7f98852_0    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.1.5            py36ha9443f7_0  
pandoc                    2.17.1.1             ha770c72_0    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
pango                     1.48.7               hb8ff022_0    conda-forge
parso                     0.8.3              pyhd3eb1b0_0  
patsy                     0.5.1                    py36_0  
pcre                      8.45                 h295c915_0  
perl                      5.32.1          0_h7f98852_perl5    conda-forge
pexpect                   4.8.0              pyhd3eb1b0_3  
pickleshare               0.7.5           pyhd3eb1b0_1003  
pillow                    8.3.1            py36h2c7a002_0  
pip                       21.2.2           py36h06a4308_0  
pixman                    0.38.0            h516909a_1003    conda-forge
pkg-config                0.29.2            h36c2ea0_1008    conda-forge
prometheus_client         0.13.1             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.20             pyhd3eb1b0_0  
ptyprocess                0.7.0              pyhd3eb1b0_2  
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pygments                  2.11.2             pyhd3eb1b0_0  
pynndescent               0.5.6                    pypi_0    pypi
pyparsing                 3.0.4              pyhd3eb1b0_0  
pyqt                      5.9.2            py36h05f1152_2  
pyrsistent                0.17.3           py36h8f6f2f9_2    conda-forge
pytables                  3.6.1            py36h71ec239_0  
python                    3.6.13               h12debd9_1  
python-dateutil           2.8.2              pyhd3eb1b0_0  
python-igraph             0.9.6            py36h644ed5e_0    conda-forge
python_abi                3.6                     2_cp36m    conda-forge
pytz                      2021.3             pyhd3eb1b0_0  
pyyaml                    5.4.1            py36h27cfd23_1  
pyzmq                     22.1.0           py36h7068817_0    conda-forge
qt                        5.9.7                h5867ecd_1  
readline                  8.1.2                h7f8727e_1  
scanpy                    1.7.2                    pypi_0    pypi
scikit-learn              0.24.2           py36ha9443f7_0  
scipy                     1.5.2            py36h0b6359f_0  
seaborn                   0.11.2             pyhd3eb1b0_0  
send2trash                1.8.0              pyhd8ed1ab_0    conda-forge
setuptools                58.0.4           py36h06a4308_0  
sinfo                     0.3.4                    pypi_0    pypi
sip                       4.19.8           py36hf484d3e_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.37.2               hc218d9a_0  
statsmodels               0.12.2           py36h27cfd23_0  
stdlib-list               0.8.0                    pypi_0    pypi
suitesparse               5.10.1               hd8046ac_0    conda-forge
tbb                       2020.3                intel_304    intel
terminado                 0.12.1           py36h5fab9bb_0    conda-forge
testpath                  0.5.0              pyhd8ed1ab_0    conda-forge
texttable                 1.6.4              pyhd8ed1ab_0    conda-forge
threadpoolctl             2.2.0              pyh0d69192_0  
tk                        8.6.11               h1ccaba5_0  
tornado                   6.1              py36h27cfd23_0  
tqdm                      4.62.3                   pypi_0    pypi
traitlets                 4.3.3            py36h06a4308_0  
umap-learn                0.5.2                    pypi_0    pypi
wcwidth                   0.2.5              pyhd3eb1b0_0  
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.37.1             pyhd3eb1b0_0  
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.2                h470a237_5    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-libxt                1.2.1                h7f98852_2    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h7b6447c_0  
yaml                      0.2.5                h7b6447c_0  
zeromq                    4.3.4                h9c3ff4c_0    conda-forge
zlib                      1.2.11               h7f8727e_4  
zstd                      1.4.9                haebb681_0 
phbradley commented 2 years ago

Hi Aviv, Thanks for trying conga! I've run some really big sets and I haven't run into this yet, very cool! As a quick check, if you manually reduce the dpi in the plt.savefig command, say from 300 to 200 or 100, does that "fix" it (ie, make the error go away)? Also curious if you have any log output from before the error... Take care, Phil

AvivBenchorin commented 2 years ago

Hi Phil,

Here are the last couples lines of log output leading up to the error: Stdout:

making cluster logos: 603 610 CoNGA_out_restarted_tcr_clumping_logos.png
making cluster logos: 604 610 CoNGA_out_restarted_tcr_clumping_logos.png
making cluster logos: 605 610 CoNGA_out_restarted_tcr_clumping_logos.png
making cluster logos: 606 610 CoNGA_out_restarted_tcr_clumping_logos.png
making cluster logos: 607 610 CoNGA_out_restarted_tcr_clumping_logos.png
making cluster logos: 608 610 CoNGA_out_restarted_tcr_clumping_logos.png
making cluster logos: 609 CoNGA_out_restarted_tcr_clumping_logos.png
making: CoNGA_out_restarted_tcr_clumping_logos.png

Stderr:

.................................................. 80000
.................................................. 85000
.................................................. 90000
.................................................. 95000
.................................................. 100000
.................................................. 105000
.................................................. 110000
...................................
... storing 'test' as categorical
Traceback (most recent call last):
  File "/path/to/conga/scripts/run_conga.py", line 831, in <module>
    pvalue_threshold_for_logos=args.pvalue_threshold_for_tcr_clumping,
  File "/path/to/conga/conga/plotting.py", line 3193, in make_tcr_clumping_plots
    **logo_plot_args)
  File "/path/to/conga/conga/plotting.py", line 3138, in make_cluster_logo_plots_figure
    **kwargs)
  File "/path/to/conga/conga/plotting.py", line 1450, in make_logo_plots
    plt.savefig(logo_pngfile, dpi=300)
  File "/path/to/miniconda3/envs/conga_new_env/lib/python3.6/site-packages/matplotlib/pyplot.py", line 859, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/path/to/miniconda3/envs/conga_new_env/lib/python3.6/site-packages/matplotlib/figure.py", line 2311, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/path/to/miniconda3/envs/conga_new_env/lib/python3.6/site-packages/matplotlib/backend_bases.py", line 2217, in print_figure
    **kwargs)
  File "/path/to/miniconda3/envs/conga_new_env/lib/python3.6/site-packages/matplotlib/backend_bases.py", line 1639, in wrapper
    return func(*args, **kwargs)
  File "/path/to/miniconda3/envs/conga_new_env/lib/python3.6/site-packages/matplotlib/backends/backend_agg.py", line 509, in print_png
    FigureCanvasAgg.draw(self)
  File "/path/to/miniconda3/envs/conga_new_env/lib/python3.6/site-packages/matplotlib/backends/backend_agg.py", line 402, in draw
    self.renderer = self.get_renderer(cleared=True)
  File "/path/to/miniconda3/envs/conga_new_env/lib/python3.6/site-packages/matplotlib/backends/backend_agg.py", line 418, in get_renderer
    self.renderer = RendererAgg(w, h, self.figure.dpi)
  File "/path/to/miniconda3/envs/conga_new_env/lib/python3.6/site-packages/matplotlib/backends/backend_agg.py", line 96, in __init__
    self._renderer = _RendererAgg(int(width), int(height), dpi)
ValueError: Image size of 4124x92909 pixels is too large. It must be less than 2^16 in each direction.

I will try to run the reanalyze step again with the reduced the DPI of 100. Do you have any suggestions for "fast-forwarding" to this specific step when running the run_conga.py command? With the size of the data, running the entire reanalyze step until the point that it previously crashed would take a few hours.

Best, Aviv

phbradley commented 2 years ago

Thanks! You could try replacing " --all " with " --tcr_clumping " but it will still take a little while, I'm afraid.

That is a lot of convergent TCR clusters! Another thing you could to to focus on the most interesting ones would be to add " --min_cluster_size_for_tcr_clumping_logos 10 "

The current default min size is 3, which may be too low for big datasets.

Are you expecting that degree of TCR sequence convergence? It makes me a tiny bit worried that somehow clonotypes might be getting "split" during the preprocessing, leading to apparent high sequence convergence (identical sequences shared between different clonotypes, but they are actually the same clonotype). Just a thought.

AvivBenchorin commented 2 years ago

Hi Phil,

Rerunning the reanalysis step after changing the DPI value to 100 in the make_logo_plots function made the error go away, although in the generated TCR clumping and graph_vs_graph visualization images the logos are a lower resolution, as would be expected, and slightly more difficult to read. As a workaround to get that part of the code running, however, it worked successfully, and I can play around with increasing the DPI values until I reach the DPI limit of having higher logo resolution and not breaking the savefig function call. Thank you!

The expected amount of TCR sequence convergence within the datasets is unknown, are there any easily modifiable parameters in the preprocessing steps could be tweaked to help reduce any potential "splitting" of clonotypes?

I ran into other issues that impeded on successfully running the rest of the CoNGA workflow, but I will open a new issue with the details of those problems.

Best, Aviv