scverse / scirpy

A scanpy extension to analyse single-cell TCR and BCR data.
https://scirpy.scverse.org/en/latest/
BSD 3-Clause "New" or "Revised" License
221 stars 34 forks source link

Bio.Alphabet error & error when loading TCR data from TraCeR #299

Closed willblev closed 10 months ago

willblev commented 3 years ago

Hi! First off, thanks for developing such a useful package. I have been running into some issues while trying to import data from TraCeR into anndata objects using scirpy following the Scirpy tutorial.

As a quick comment, after creating a fresh Scirpy conda env, I initially got an error about Bio.Alphabet that more recent versions of Biopython stopped including Bio.Alphabet:

 File "/path/redacted/anaconda3/envs/scirpy_newest/lib/python3.8/site-packages/scirpy/io/_io.py", line 303, in read_tracer
    tracer_obj = pickle.load(f)
  File "/path/redacted/site-packages/Bio/Alphabet/__init__.py", line 20, in <module>
    raise ImportError(
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

Rolling back to Biopython version 1.7.2 seemed to solve this. However, when I tried again to import the TCR data from TraCeR using the _io.readtracer function, I get the following error:

import anndata
anndata.logging.anndata_logger.addFilter(
    lambda r: not r.getMessage().startswith("storing")
    and r.getMessage().endswith("as categorical.")
)
import scirpy as ir
import scanpy as sc
from glob import glob
import pandas as pd
import tarfile
import warnings

path_to_tracer_output="path/redacted"
adata_tcr = ir.io.read_tracer(path_to_tracer_output)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/redacted/anaconda3/envs/scirpy/lib/python3.7/site-packages/scirpy/io/_io.py", line 308, in read_tracer
    chains[chain_id], f"TR{chain_id}"
  File "/path/redacted/anaconda3/envs/scirpy/lib/python3.7/site-packages/scirpy/io/_io.py", line 245, in _process_chains
    if tmp_chain.cdr3 == "N/A" or tmp_chain.cdr3nt == "N/A":
AttributeError: 'Recombinant' object has no attribute 'cdr3nt'

My Scirpy env is as follows:

Thank you in advance for your time and if there is anything else you need to know about my setup please ask away!

grst commented 3 years ago

Hi @willblev,

thanks for reporting this issue! While I think I can easily find a workaround for the Biopython issue, the second part looks a little more sinister. Essentially, the objects structure of your Tracer output seems different to what scirpy expects.

What version of tracer are you using? And is there any chance you could send me the Tracer output folder of a single cell that fails? That would be extremely helpful for investigating.

Best, Gregor

willblev commented 3 years ago

Thanks for the speedy reply @grst!

Essentially, the objects structure of your Tracer output seems different to what scirpy expects.

Indeed, this is what I also believed to be the problem.

What version of tracer are you using?

I downloaded & installed TraCeR a few weeks ago (following instructions from their GitHub repo); the repo says the most recent version is v0.6, however it appears that it actually installs TraCeR v0.5, or at least that is how it shows up in my conda env list. I will look into this further.

And is there any chance you could send me the Tracer output folder of a single cell that fails?

I have attached the output of one cell which results in the error. Thanks for looking into this!

TraCeR_output_example.zip

grst commented 3 years ago

I started looking into this and it seems it is becoming increasingly difficult to keep the read_tracer function working.

I'm not sure how to proceeed... maybe it would be worth packaging a simple script to convert tracer to AIRR with the appropriate Biopython version into a docker/singularity container and drop direct support for tracer from scirpy itself. For now I could just try to patch the issues and print out a message for the user to manually install Biopython 1.72 if they want to use read_tracer.

Have you considered any other tools for TCR reconstruction? TRUST4 looks quite nice and is actively maintained, but I didn't yet have a chance to try it myself.

willblev commented 3 years ago

Thanks again for your time and for looking into this!

I inherited this project from a teammate so in my case, TraCeR had already been run (hence I did not consider other TCR reconstruction tools).

I understand the challenge of maintaining your package every time one of these functions breaks due to a deprecated function or a change in file structure... In my case, I may be able to get away with re-running the last step of the TraCeR pipeline (tracer summarize) using an older version of TraCeR so that the .pkl files it generates will be in the format which Scirpy expects.

Which was the last supported version of TraCeR?

grst commented 3 years ago

Which was the last supported version of TraCeR?

The version I have been successfully using was built using this Dockerfile based on the teichlab/tracer:latest two years ago: https://github.com/icbi-lab/smartseq2_pipeline/tree/master/Docker/tracer

However I'm afraid the base image got updated since and Docker purged our version of the container due to their savings measures.

I don't know how familiar you are with Python, but the easiest solution could be to patch the read_tracer function yourself to ignore missing cdr3nt and whatever else pops up: https://github.com/icbi-lab/scirpy/blob/113ee731bf39c508a6cf049fd87e27fb93685811/scirpy/io/_io.py#L317-L319

grst commented 10 months ago

This issue is getting quite old... if someone is still using tracer and has issues, please open a new one.