Open MiThoSan opened 1 year ago
Dear @MiThoSan
thank you very much for the positive feedback! This project is by more than just Theislab even though it's hosted in our Github organization.
What have you tried so far and where do things go wrong? This might also be a question for the developers of tcrdist3 and not us...
Many thanks for your fast response!
I tried to use a subset of my TCR data and followed your instructions and everything works as expected. However, if I exceed 10'000 TCRs in my data I receive the following message by running TCRdist. Input: tr = TCRrep(cell_df=df_tcrdist, organism="human", chains=["alpha", "beta"])
Resulting error message:
When TCRrep.
warnings.warn(f"\n\nWhen TCRrep.
Following the link https://tcrdist3.readthedocs.io/en/latest/sparsity.html allows me to perform the analysis as indicated: tr = TCRrep(cell_df=df_tcrdist, organism="human", chains=["alpha", "beta"], compute_distances = False) tr.cpus = 2 tr.compute_sparse_rect_distances(radius = 50, chunk_size = 100)
However, I think my main problem is the correct handling with the resulted sparse representation. The following step of your script (with small adjustments pw_alpha -> rw_alpha) resulted in an error message for the last line of code: Input: dist_total = tr.rw_alpha + tr.rw_beta columns = tr.clone_df["index"].astype(float).astype(int) df_dist = pd.DataFrame(dist_total, columns=columns, index=columns)
Error message: ValueError: Shape of passed values is (27136, 1), indices imply (27136, 27136)
Your help is highly appreciated.
On 20 May 2023, at 18:57, Lukas Heumos @.***> wrote:
Dear @MiThoSanhttps://github.com/MiThoSan
thank you very much for the positive feedback! This project is by more than just Theislab even though it's hosted in our Github organization.
What have you tried so far and where do things go wrong? This might also be a question for the developers of tcrdist3 and not us...
— Reply to this email directly, view it on GitHubhttps://github.com/theislab/single-cell-best-practices/issues/195#issuecomment-1555951741, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A76XNXFZZDDPHSQQB5UDVEDXHDZWVANCNFSM6AAAAAAYHQPLLI. You are receiving this because you were mentioned.Message ID: @.***>
We can't do anything about this when their return is inconsistent. I am afraid that you'll have to open an issue over there and ask for guidance.
@MiThoSan If this is still relevant: "compute_sparse_rect_distances" calculates the distances between 1 TCR to all of a reference set, which can be used quite efficiently for database queries (resulting in distances (len_atlas x len_query). However, it does not compute the pairwise distances used here.
I faced the scaling issue once myself, and had the following snippet lying around to solve this:
tr = TCRrep(cell_df=df,
organism='human',
chains=['alpha', 'beta'],
compute_distances=False,
deduplicate=False,
db_file='alphabeta_gammadelta_db.tsv')
tr.compute_distances()
I will test this, and add it to the notebook with a warning not to use too large datasets (10k-15k still worked on my laptop). As this is the more general case and many people might have more than 10k clones, it should be handled in the book. Thanks for pointing this out
Dear Theis Lab,
I am following your excellent repository for using TCRdist. The instructions work perfectly fine by using a subset of my TCRs however, by exceeding 10'000 clones it is suggested to use a Sparse Representation (https://tcrdist3.readthedocs.io/en/0.2.0/sparsity.html). How can I handle the resulting compressed sparse matrix to also get a clonotype_network representation of my dataset?