s175573 / GIANA

Ultrafast TCR clustering algorithm based on geometric isometry
Other
63 stars 30 forks source link

deciphering query results #12

Open CSree opened 1 month ago

CSree commented 1 month ago

Hi after running the query against the ref database, I get the file with the results. However the antigen column reads "COVID19:ADIRP0000356_TCRB.tsv", and so on. How do I access this tsv file Thank you Chai

s175573 commented 1 month ago

Hello Chai,

I've uploaded this file to the data folder. Please check.

Thanks, Bo


From: CSree @.> Sent: Wednesday, October 16, 2024 1:13 PM To: s175573/GIANA @.> Cc: Subscribed @.***> Subject: [External][s175573/GIANA] deciphering query results (Issue #12)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

Hi after running the query against the ref database, I get the file with the results. However the antigen column reads "COVID19:ADIRP0000356_TCRB.tsv", and so on. How do I access this tsv file Thank you Chai

— Reply to this email directly, view it on GitHubhttps://github.com/s175573/GIANA/issues/12, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWYQC4VI6BP4FJF6YRYGIDZ32ND3AVCNFSM6AAAAABQB6DUECVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4TENJTGQZTGMQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.

CSree commented 1 month ago

Hi Bo thanks for the prompt response, I was able to access the file. I have a few qns which I should have asked before the prev one. So I queried a file, against the Ref_Cancer_Covid_MS database. I put the clustered file for both in the same dirs as the main file. I got 3 files as output:

  1. tmp_query--RotationEncodingBL62.txt : within GIANA dir, this has fewer sequences than the input file, clustered by GIANA, ref column has entries ref and query.

  2. tmp_query.txt : within GIANA dir, this has several more sequences than in the input file, no giana clusters, and query col has ref and query entries.

  3. _query_Ref_Cancer_Covid_MS.txt : this is in the output folder specified by me, it has few sequences, and the final column has only query in the entries.

How am I to interpret each file, I also notice that there are sequences in these files that are not present in the input file. I understand the ones that are fused with the reference , however even among the sequences that are listed as query in the final column, there are many that are not there in the input file.

Please help me understand this, and which is the final file I can utilize for further analysis. Thanks in advance. Chai

s175573 commented 1 month ago

Only the last file is for final processing. The tmp_* files are intermediate and should be ignored.

Thanks, Bo


From: CSree @.> Sent: Thursday, October 17, 2024 12:58 PM To: s175573/GIANA @.> Cc: Li, Bo @.>; Comment @.> Subject: [External]Re: [s175573/GIANA] deciphering query results (Issue #12)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

Hi Bo thanks for the prompt response, I was able to access the file. I have a few qns which I should have asked before the prev one. So I queried a file, against the Ref_Cancer_Covid_MS database. I put the clustered file for both in the same dirs as the main file. I got 3 files as output:

  1. tmp_query--RotationEncodingBL62.txt : within GIANA dir, this has fewer sequences than the input file, clustered by GIANA, ref column has entries ref and query.

  2. tmp_query.txt : within GIANA dir, this has several more sequences than in the input file, no giana clusters, and query col has ref and query entries.

  3. _query_Ref_Cancer_Covid_MS.txt : this is in the output folder specified by me, it has few sequences, and the final column has only query in the entries.

How am I to interpret each file, I also notice that there are sequences in these files that are not present in the input file. I understand the ones that are fused with the reference , however even among the sequences that are listed as query in the final column, there are many that are not there in the input file.

Please help me understand this, and which is the final file I can utilize for further analysis. Thanks in advance. Chai

— Reply to this email directly, view it on GitHubhttps://github.com/s175573/GIANA/issues/12#issuecomment-2420050774, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWYQC4ITYLMOLXNDZ52CXLZ37UDJAVCNFSM6AAAAABQB6DUECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRQGA2TANZXGQ. You are receiving this because you commented.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.

CSree commented 1 month ago

Oh ok. Since it has no sequences with ref in the last column, does that mean that none of the sequences matched with any in the reference? It has >125,000 tcr sequences form one patient with covid, I thought a good percentage of them should have matched. Chai

s175573 commented 1 month ago

How many sequences are there in your reference data?

Thanks, Bo


From: CSree @.> Sent: Thursday, October 17, 2024 1:30 PM To: s175573/GIANA @.> Cc: Li, Bo @.>; Comment @.> Subject: [External]Re: [s175573/GIANA] deciphering query results (Issue #12)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

Oh ok. Since it has no sequences with ref in the last column, does that mean that none of the sequences matched with any in the reference? It has >125,000 tcr sequences form one patient with covid, I thought a good percentage of them should have matched. Chai

— Reply to this email directly, view it on GitHubhttps://github.com/s175573/GIANA/issues/12#issuecomment-2420109847, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWYQCYJG7RGSNRL6SOQYWLZ37X3VAVCNFSM6AAAAABQB6DUECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRQGEYDSOBUG4. You are receiving this because you commented.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.

CSree commented 1 month ago

I used the first in the list : HC100_COVID300_HNSC_MMR_PDA_OV_Page_BrainMet_LungMDA_melanoma_CRC_DC_RCC_Bladder_Cervical_Riaz_MS--RotationEncodingBL62.txt

found here: https://zenodo.org/records/4698929

It has 10,285,868 sequences pertaining to multiple diseases including COVID, cancer, Multiple sclerosis, etc.

s175573 commented 1 month ago

Can you paste the first 100 lines of your query TCR file? It is strange that none clustered with the ref.

Thanks, Bo


From: CSree @.> Sent: Thursday, October 17, 2024 2:17 PM To: s175573/GIANA @.> Cc: Li, Bo @.>; Comment @.> Subject: [External]Re: [s175573/GIANA] deciphering query results (Issue #12)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

I used the first in the list : HC100_COVID300_HNSC_MMR_PDA_OV_Page_BrainMet_LungMDA_melanoma_CRC_DC_RCC_Bladder_Cervical_Riaz_MS--RotationEncodingBL62.txt

found here: https://zenodo.org/records/4698929

It has 10,285,868 sequences pertaining to multiple diseases including COVID, cancer, Multiple sclerosis, etc.

— Reply to this email directly, view it on GitHubhttps://github.com/s175573/GIANA/issues/12#issuecomment-2420228688, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWYQCZ5UFQVAB7SBU7F6YDZ375MHAVCNFSM6AAAAABQB6DUECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRQGIZDQNRYHA. You are receiving this because you commented.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.

CSree commented 1 month ago

Here it is: first 150

peptide | tcrbv

AGGDEQFF | TCRBV04-01 AKGDRAGRGNTIYF | TCRBV06-X APAAPGHITEAFF | TCRBV05-01 AYNEQFF | TCRBV12-X CAAAENTEAFF | TCRBV10-03 CAAAGDNTEAFF | TCRBV10-02 CAAAGTYGEQYF | TCRBV28-01 CAAAIEGSSPLHF | TCRBV19-01 CAAAQGASYSPLHF | TCRBV15-01 CAAASAYEQYF | TCRBV28-01 CAAAVRSSTEAFF | TCRBV12-03/12-04 CAAAYSGANVLTF | TCRBV06-05 CAADRGHSDTQYF | TCRBV06-06 CAADRQTANYGYTF | TCRBV06-05 CAAEAGGVDGQPQHF | TCRBV27-01 CAAEGNTEAFF | TCRBV28-01 CAAEGTGPNEQFF | TCRBV02-01 CAAESGLAGGPREQYF | TCRBV27-01 CAAEVTVLGYGYTF | TCRBV30-01 CAAFLPGEGTEAFF | TCRBV10-02 CAAGAGQSNQPQHF | TCRBV28-01 CAAGATAARLRYEQYF | TCRBV30-01 CAAGATSARLRYEQYF | TCRBV30-01 CAAGEDGYTF | TCRBV09-01 CAAGEQMNTEAFF | TCRBV02-01 CAAGGAGNTIYF | TCRBV19-01 CAAGGGGKETQYF | TCRBV07-07 CAAGGGRFLSDTQYF | TCRBV19-01 CAAGGIELAGWRETQYF | TCRBV19-01 CAAGGNFGETQYF | TCRBV06-05 CAAGGNQPQHF | TCRBV02-01 CAAGGPDTGELFF | TCRBV06-05 CAAGGQGPYEQYF | TCRBV06-05 CAAGGSVDEQFF | TCRBV10-03 CAAGGSYEQYF | TCRBV06-X CAAGLAGEETQYF | TCRBV06-05 CAAGLTNSPLHF | TCRBV27-01 CAAGNYGYTF | TCRBV19-01 CAAGPFTDTQYF | TCRBV27-01 CAAGQSGSNQPQHF | TCRBV19-01 CAAGQSWSQPQHF | TCRBV06-06 CAAGQSYNEQFF | TCRBV19-01 CAAGREAWEQYF | TCRBV02-01 CAAGRETQYF | TCRBV19-01 CAAGRNQGGETQYF | TCRBV06-05 CAAGRQGGGYEQYF | TCRBV27-01 CAAGRQSGYTF | TCRBV06-05 CAAGRSDTQYF | TCRBV03-01/03-02 CAAGRYSYEQYF | TCRBV03-01/03-02 CAAGSANEQYF | TCRBV27-01 CAAGSGGSGEKLFF | TCRBV19-01 CAAGSGQGRNHGYTF | TCRBV10-03 CAAGSRYEQYF | TCRBV03-01/03-02 CAAGSYNSPLHF | TCRBV06-05 CAAGTANNSPLHF | TCRBV10-03 CAAGTGLHEQYF | TCRBV05-04 CAAGTGSNTGELFF | TCRBV10-03 CAAGTGTYEQYF | TCRBV10-03 CAAGTTSGNTIYF | TCRBV19-01 CAAGVAGTSTDTQYF | TCRBV10-03 CAAGVTDYKYF | TCRBV10-03 CAAGVYEQYF | TCRBV30-01 CAAHPGPGQGTGELFF | TCRBV05-04 CAAKGPNTGELFF | TCRBV19-01 CAAKGQGNTIYF | TCRBV19-01 CAAKGQGVSYEQYF | TCRBV30-01 CAAKGRSDEQFF | TCRBV05-06 CAAKGSGTADTQYF | TCRBV06-05 CAAKGVMDEKLFF | TCRBV19-01 CAAKIGNYGYTF | TCRBV12-03/12-04 CAAKRIDNEKLFF | TCRBV15-01 CAAKSGQNEKLFF | TCRBV19-01 CAALGGPRNTEAFF | TCRBV25-01 CAALGNTEAFF | TCRBV10-03 CAALRSLAGGSYEQYF | TCRBV10-01 CAALSKRGVSYNEQFF | TCRBV06-06 CAANHQQGVAYGYTF | TCRBV12-05 CAANKRFGNEQFF | TCRBV15-01 CAANLRRWYGYTF | TCRBV30-01 CAANRGGNEQFF | TCRBV10-03 CAANSDSHSNQPQHF | TCRBV19-01 CAANSGAKNIQYF | TCRBV19-01 CAAPGDSSGNTIYF | TCRBV30-01 CAAPGGDGLGNQPQHF | TCRBV19-01 CAAPGPGISGNTIYF | TCRBV24-01 CAAPPGCSYNSPLHF | TCRBV06-04 CAAQASNTGELFF | TCRBV06-05 CAAQDIWGIGETQYF | TCRBV07-06 CAAQGARSNYGYTF | TCRBV30-01 CAAQGDYGYTF | TCRBV02-01 CAAQGGNQPQHF | TCRBV10-03 CAAQGMRPNYGYTF | TCRBV06-X CAAQGQVTDTQYF | TCRBV24-01 CAAQGRDLAKNIQYF | TCRBV06-X CAAQGRQGTEAFF | TCRBV06-05 CAAQGRSNQPQHF | TCRBV19-01 CAAQGVSNQPQHF | TCRBV10-03 CAAQTGPHYGYTF | TCRBV06-05 CAARADSYYEQYF | TCRBV10-03 CAARAGGASTDTQYF | TCRBV19-01 CAARAGTRLPTDTQYF | TCRBV27-01 CAARAVNYGYTF | TCRBV19-01 CAARDPGAYNSPLHF | TCRBV19-01 CAARDSINYGYTF | TCRBV06-05 CAARDTSNSPLHF | TCRBV19-01 CAARETDTDTQYF | TCRBV18-01 CAARGGLYEQYF | TCRBV28-01 CAARGGSYNEQFF | TCRBV04-01 CAARGGTYNEQFF | TCRBV02-01 CAARGPNTGELFF | TCRBV28-01 CAARGRGGRQFF | TCRBV06-05 CAARGTEAFF | TCRBV04-02 CAARLGGANTEAFF | TCRBV21-01 CAARLGGNSPLHF | TCRBV10-03 CAARPGQSSGANVLTF | TCRBV10-03 CAARQGANEKLFF | TCRBV06-05 CAARQGANEKLFF | TCRBV06-06 CAARQGTNSPLHF | TCRBV27-01 CAARQMNTEAFF | TCRBV10-03 CAARQVSGNTIYF | TCRBV13-01 CAARRGQGASTGELFF | TCRBV27-01 CAARSGGAQNSPLHF | TCRBV06-X CAARTGGYPKTQYF | TCRBV23-01 CAARTPGQKNEKLFF | TCRBV06-05 CAARTTGDGNTIYF | TCRBV24-01 CAASADRDSYKTQYF | TCRBV02-01 CAASESGTGGMAFF | TCRBV24-01 CAASGEGQPQHF | TCRBV21-01 CAASGERHQETQYF | TCRBV06-05 CAASGGATEAFF | TCRBV11-02 CAASGGGYGYTF | TCRBV15-01 CAASGGRVSGANVLTF | TCRBV04-03 CAASGGTDRIYEQYF | TCRBV10-03 CAASGRGGARTQYF | TCRBV06-X CAASGVGPEQFF | TCRBV10-03 CAASLGGRTEAFF | TCRBV15-01 CAASREGRTNEKLFF | TCRBV15-01 CAASRGTLLYGYTF | TCRBV09-01 CAASSGSSDTQYF | TCRBV04-02 CAASSGTGANVLTF | TCRBV10-03 CAASSRGADTDTQYF | TCRBV06-05 CAASTGSNQPQHF | TCRBV06-05 CAASTGTGAQEQYF | TCRBV28-01 CAATEEYEQYF | TCRBV21-01 CAATGGYGYTF | TCRBV06-05 CAATGNTEAFF | TCRBV19-01 CAATGQGGYEQYF | TCRBV06-X CAATGRVGQPQHF | TCRBV19-01 CAATGSGGTQYF | TCRBV10-03

s175573 commented 1 month ago

Your TRBV gene name format doesn't match the reference data. Please fix this according to the criteria in this code:

https://github.com/s175573/GIANA/blob/master/ProcessAdaptiveFile.R

Thanks, Bo


From: CSree @.> Sent: Thursday, October 17, 2024 3:57 PM To: s175573/GIANA @.> Cc: Li, Bo @.>; Comment @.> Subject: [External]Re: [s175573/GIANA] deciphering query results (Issue #12)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

Here it is: first 150

peptide | tcrbv

AGGDEQFF | TCRBV04-01 AKGDRAGRGNTIYF | TCRBV06-X APAAPGHITEAFF | TCRBV05-01 AYNEQFF | TCRBV12-X CAAAENTEAFF | TCRBV10-03 CAAAGDNTEAFF | TCRBV10-02 CAAAGTYGEQYF | TCRBV28-01 CAAAIEGSSPLHF | TCRBV19-01 CAAAQGASYSPLHF | TCRBV15-01 CAAASAYEQYF | TCRBV28-01 CAAAVRSSTEAFF | TCRBV12-03/12-04 CAAAYSGANVLTF | TCRBV06-05 CAADRGHSDTQYF | TCRBV06-06 CAADRQTANYGYTF | TCRBV06-05 CAAEAGGVDGQPQHF | TCRBV27-01 CAAEGNTEAFF | TCRBV28-01 CAAEGTGPNEQFF | TCRBV02-01 CAAESGLAGGPREQYF | TCRBV27-01 CAAEVTVLGYGYTF | TCRBV30-01 CAAFLPGEGTEAFF | TCRBV10-02 CAAGAGQSNQPQHF | TCRBV28-01 CAAGATAARLRYEQYF | TCRBV30-01 CAAGATSARLRYEQYF | TCRBV30-01 CAAGEDGYTF | TCRBV09-01 CAAGEQMNTEAFF | TCRBV02-01 CAAGGAGNTIYF | TCRBV19-01 CAAGGGGKETQYF | TCRBV07-07 CAAGGGRFLSDTQYF | TCRBV19-01 CAAGGIELAGWRETQYF | TCRBV19-01 CAAGGNFGETQYF | TCRBV06-05 CAAGGNQPQHF | TCRBV02-01 CAAGGPDTGELFF | TCRBV06-05 CAAGGQGPYEQYF | TCRBV06-05 CAAGGSVDEQFF | TCRBV10-03 CAAGGSYEQYF | TCRBV06-X CAAGLAGEETQYF | TCRBV06-05 CAAGLTNSPLHF | TCRBV27-01 CAAGNYGYTF | TCRBV19-01 CAAGPFTDTQYF | TCRBV27-01 CAAGQSGSNQPQHF | TCRBV19-01 CAAGQSWSQPQHF | TCRBV06-06 CAAGQSYNEQFF | TCRBV19-01 CAAGREAWEQYF | TCRBV02-01 CAAGRETQYF | TCRBV19-01 CAAGRNQGGETQYF | TCRBV06-05 CAAGRQGGGYEQYF | TCRBV27-01 CAAGRQSGYTF | TCRBV06-05 CAAGRSDTQYF | TCRBV03-01/03-02 CAAGRYSYEQYF | TCRBV03-01/03-02 CAAGSANEQYF | TCRBV27-01 CAAGSGGSGEKLFF | TCRBV19-01 CAAGSGQGRNHGYTF | TCRBV10-03 CAAGSRYEQYF | TCRBV03-01/03-02 CAAGSYNSPLHF | TCRBV06-05 CAAGTANNSPLHF | TCRBV10-03 CAAGTGLHEQYF | TCRBV05-04 CAAGTGSNTGELFF | TCRBV10-03 CAAGTGTYEQYF | TCRBV10-03 CAAGTTSGNTIYF | TCRBV19-01 CAAGVAGTSTDTQYF | TCRBV10-03 CAAGVTDYKYF | TCRBV10-03 CAAGVYEQYF | TCRBV30-01 CAAHPGPGQGTGELFF | TCRBV05-04 CAAKGPNTGELFF | TCRBV19-01 CAAKGQGNTIYF | TCRBV19-01 CAAKGQGVSYEQYF | TCRBV30-01 CAAKGRSDEQFF | TCRBV05-06 CAAKGSGTADTQYF | TCRBV06-05 CAAKGVMDEKLFF | TCRBV19-01 CAAKIGNYGYTF | TCRBV12-03/12-04 CAAKRIDNEKLFF | TCRBV15-01 CAAKSGQNEKLFF | TCRBV19-01 CAALGGPRNTEAFF | TCRBV25-01 CAALGNTEAFF | TCRBV10-03 CAALRSLAGGSYEQYF | TCRBV10-01 CAALSKRGVSYNEQFF | TCRBV06-06 CAANHQQGVAYGYTF | TCRBV12-05 CAANKRFGNEQFF | TCRBV15-01 CAANLRRWYGYTF | TCRBV30-01 CAANRGGNEQFF | TCRBV10-03 CAANSDSHSNQPQHF | TCRBV19-01 CAANSGAKNIQYF | TCRBV19-01 CAAPGDSSGNTIYF | TCRBV30-01 CAAPGGDGLGNQPQHF | TCRBV19-01 CAAPGPGISGNTIYF | TCRBV24-01 CAAPPGCSYNSPLHF | TCRBV06-04 CAAQASNTGELFF | TCRBV06-05 CAAQDIWGIGETQYF | TCRBV07-06 CAAQGARSNYGYTF | TCRBV30-01 CAAQGDYGYTF | TCRBV02-01 CAAQGGNQPQHF | TCRBV10-03 CAAQGMRPNYGYTF | TCRBV06-X CAAQGQVTDTQYF | TCRBV24-01 CAAQGRDLAKNIQYF | TCRBV06-X CAAQGRQGTEAFF | TCRBV06-05 CAAQGRSNQPQHF | TCRBV19-01 CAAQGVSNQPQHF | TCRBV10-03 CAAQTGPHYGYTF | TCRBV06-05 CAARADSYYEQYF | TCRBV10-03 CAARAGGASTDTQYF | TCRBV19-01 CAARAGTRLPTDTQYF | TCRBV27-01 CAARAVNYGYTF | TCRBV19-01 CAARDPGAYNSPLHF | TCRBV19-01 CAARDSINYGYTF | TCRBV06-05 CAARDTSNSPLHF | TCRBV19-01 CAARETDTDTQYF | TCRBV18-01 CAARGGLYEQYF | TCRBV28-01 CAARGGSYNEQFF | TCRBV04-01 CAARGGTYNEQFF | TCRBV02-01 CAARGPNTGELFF | TCRBV28-01 CAARGRGGRQFF | TCRBV06-05 CAARGTEAFF | TCRBV04-02 CAARLGGANTEAFF | TCRBV21-01 CAARLGGNSPLHF | TCRBV10-03 CAARPGQSSGANVLTF | TCRBV10-03 CAARQGANEKLFF | TCRBV06-05 CAARQGANEKLFF | TCRBV06-06 CAARQGTNSPLHF | TCRBV27-01 CAARQMNTEAFF | TCRBV10-03 CAARQVSGNTIYF | TCRBV13-01 CAARRGQGASTGELFF | TCRBV27-01 CAARSGGAQNSPLHF | TCRBV06-X CAARTGGYPKTQYF | TCRBV23-01 CAARTPGQKNEKLFF | TCRBV06-05 CAARTTGDGNTIYF | TCRBV24-01 CAASADRDSYKTQYF | TCRBV02-01 CAASESGTGGMAFF | TCRBV24-01 CAASGEGQPQHF | TCRBV21-01 CAASGERHQETQYF | TCRBV06-05 CAASGGATEAFF | TCRBV11-02 CAASGGGYGYTF | TCRBV15-01 CAASGGRVSGANVLTF | TCRBV04-03 CAASGGTDRIYEQYF | TCRBV10-03 CAASGRGGARTQYF | TCRBV06-X CAASGVGPEQFF | TCRBV10-03 CAASLGGRTEAFF | TCRBV15-01 CAASREGRTNEKLFF | TCRBV15-01 CAASRGTLLYGYTF | TCRBV09-01 CAASSGSSDTQYF | TCRBV04-02 CAASSGTGANVLTF | TCRBV10-03 CAASSRGADTDTQYF | TCRBV06-05 CAASTGSNQPQHF | TCRBV06-05 CAASTGTGAQEQYF | TCRBV28-01 CAATEEYEQYF | TCRBV21-01 CAATGGYGYTF | TCRBV06-05 CAATGNTEAFF | TCRBV19-01 CAATGQGGYEQYF | TCRBV06-X CAATGRVGQPQHF | TCRBV19-01 CAATGSGGTQYF | TCRBV10-03

— Reply to this email directly, view it on GitHubhttps://github.com/s175573/GIANA/issues/12#issuecomment-2420409608, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWYQC6IFIIMAN4HVW66RM3Z4AJBPAVCNFSM6AAAAABQB6DUECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRQGQYDSNRQHA. You are receiving this because you commented.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.

CSree commented 3 weeks ago

Hi Bo, Thanks for that pointer, I ran the function, then the query , and got results! However, my question now is how to interpret them. For example for each tcr seq that matched to reference, I have something like this in the results col

COVID19:INCOV085-BL-3_TCRB.tsv or Autoimmune:MS_subject_11_CD4.tsv

I notice that the same query sequence has mapped to more than one ref sequence from different diseases. What does that mean? Also, the result appears to be a tsv file, for each matched sequence. How to go from there, I am interested in finding which epitope it matches to.

Thanks Chai

s175573 commented 3 weeks ago

Hi Chai,

GIANA won't get you the information regarding the epitope. For most TCRs, this is unknown. A disease relevance is the closest you can get at this point.

Best, Bo


From: CSree @.> Sent: Monday, October 28, 2024 10:31 AM To: s175573/GIANA @.> Cc: Li, Bo @.>; Comment @.> Subject: [External]Re: [s175573/GIANA] deciphering query results (Issue #12)

Hi Bo, Thanks for that pointer, I ran the function, then the query , and got results! However, my question now is how to interpret them. For example for each tcr seq that matched to reference, I have something like this in the results col

COVID19:INCOV085-BL-3_TCRB.tsv or Autoimmune:MS_subject_11_CD4.tsv

I notice that the same query sequence has mapped to more than one ref sequence from different diseases. What does that mean? Also, the result appears to be a tsv file, for each matched sequence. How to go from there, I am interested in finding which epitope it matches to.

Thanks Chai

— Reply to this email directly, view it on GitHubhttps://github.com/s175573/GIANA/issues/12#issuecomment-2441757387, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWYQC2LEI3GHRBWJ7NWW6LZ5ZDE5AVCNFSM6AAAAABQB6DUECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBRG42TOMZYG4. You are receiving this because you commented.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.

CSree commented 3 weeks ago

Ah ok. Regarding the sequences in the output of the query, I recall the documentation saying that the query clusters would be merged with the reference clusters, hence I see that there are many more sequences in the query result than in the input. There are so many sequences which are not in the input as well. Is there a way I can retrieve which input sequences they corresponded to, before getting merged? I have to compare different input files to see how the sequences disappeared or appeared, I need to compare based on the sequences which are only in the input I think, not which are in the query results.. Thanks Chai

s175573 commented 3 weeks ago

Any sequence labeled 'query' belongs to the input file.

Take care, Bo


From: CSree @.> Sent: Wednesday, October 30, 2024 8:58 AM To: s175573/GIANA @.> Cc: Li, Bo @.>; Comment @.> Subject: [External]Re: [s175573/GIANA] deciphering query results (Issue #12)

Ah ok. Regarding the sequences in the output of the query, I recall the documentation saying that the query clusters would be merged with the reference clusters, hence I see that there are many more sequences in the query result than in the input. There are so many sequences which are not in the input as well. Is there a way I can retrieve which input sequences they corresponded to, before getting merged? I have to compare different input files to see how the sequences disappeared or appeared, I need to compare based on the sequences which are only in the input I think, not which are in the query results.. Thanks Chai

— Reply to this email directly, view it on GitHubhttps://github.com/s175573/GIANA/issues/12#issuecomment-2447066210, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWYQC5F7ORSG3SWD3M3WVDZ6DJYVAVCNFSM6AAAAABQB6DUECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBXGA3DMMRRGA. You are receiving this because you commented.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.