Open RoganGrant opened 2 years ago
And for clarity, this is what happens if I import annoy
without a gcc module loaded, even though annoy
is installed in my virtual environment:
import annoy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "path/venv/lib/python3.9/site-packages/annoy/__init__.py", line 16, in <module>
from .annoylib import Annoy as AnnoyIndex
ImportError: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by path/venv/lib/python3.9/site-packages/annoy/annoylib.cpython-39-x86_64-linux-gnu.so)
You can run scrublet in your own conda environment and reference that environment's lib path rather than the HPC's by (replace w/ appropriate dir): export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/miniconda3/lib
First of all, thank for scrublet! I have been using for a while now, and much prefer it over the alternatives.
As for the issue, it's a bit niche but can potentially cause serious silent issues on an HPC. Even if
annoy
is installed, loading can fail if a semi-recent version of gcc is not currently in the user's path. For HPC users, this would generally require loading a GCC module. In my case,module load gcc/11.2.0
restores the missing library and solves the issue.Minimal code to reproduce the issue:
And the behavior:
In this case, doublet rate is still estimated, but apparently without finding nearest neighbors for simulated doublets. Or perhaps another method is used? Still, would be worth throwing a stronger warning of some sort or even failing in this case. If this analysis is automated, these sorts of messages may be missed entirely.