zktuong / dandelion

dandelion - A single cell BCR/TCR V(D)J-seq analysis package for 10X Chromium 5' data
https://sc-dandelion.readthedocs.io/
GNU Affero General Public License v3.0
103 stars 25 forks source link

IGDATA vs. IGBLAST_DB Confusion + Other Installation kerfuffles #382

Open indianewok opened 5 months ago

indianewok commented 5 months ago

Description of the bug

Hi! This is a wonderful tool--I've been playing with it since it was a biorxiv piece, so glad to see it published and out in the wild! I was updating my installation from an older version, and have noticed a few QOL things. I'm running dandelion as a juypter notebook file on a remote server using VSCode, so there's a lot of nested elements here but some things are constant regardless of whether I ran them through there or the command line/terminal.

  1. In your vignette, you mention setting the path variables for IGDATA, GERMLINE, and BLASTDB. GERMLINE and BLASTDB work, but IGDATA would, for whatever reason, not work. I realized that I had set IGDATA to point at the ./igblast/bin file, not the container/igblast/database file. Looking through why I had done that, it seems to be a pretty common solution for when IGBlast is acting up (link here: https://github.com/xinyu-dev/igblast, and here: https://github.com/Teichlab/tracer/issues/48). Maybe there's another path variable that can be exported for the igblast database, or you can just suggest that people manually set igblast_db to the path where they have that saved? This one did cause me a good amount of time, so I just wanted to note it somewhere in case someone else runs into something similar!

  2. In terms of having these databases--when I download sc-dandelion with pip, after setting up a fresh conda environment to run this all in, is there any way that you can package the databases with the installation there? Since a lot of the database querying is hard-coded (totally an igBlast thing, not a dandelion thing) it would save a lot of time and effort to perhaps add those to the package (elements in ./container/database/ specifically). My workaround was to clone the repo via github and use the paths from there, but that might not be feasible/might be easier to bundle them in with the tool itself? I'm not as familar with python so if any of what I'm saying sounds ridiculous please feel free to ignore.

  3. I noticed you're generating a version of this that works with python 3.12, so feel free to disregard this: the version that I got to work was python 3.9.18. Trying to run it in 3.12 garnered me an error regarding tuples in airr, using versioneer, upon installation. Running it on 3.8 gave me issues as well. I don't know if this is helpful but the error was consistently in airr, using versioneer, for a tuple syntax that is (no longer) supported? If this is a useful issue, please let me know how I can document it to be most helpful.

Thanks so much for a wonderful tool!!

Minimal reproducible example

No response

The error message produced by the code above

No response

OS information

RHEL Fedora/Linux

Version information

dandelion==0.3.5 pandas==2.2.0 numpy==1.26.4 matplotlib==3.8.4 networkx==3.2.1 scipy==1.13.0

Additional context

No response

zktuong commented 5 months ago

hi @indianewok, thank you for your interest and support !