liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
256 stars 46 forks source link

how the VDJs are assigned to contigs #169

Open saramoein372 opened 1 year ago

saramoein372 commented 1 year ago

Hello,

I have a question about how the VDJs are assigned to the contigs. I have a trust4_cdr3.out, and I can see for example for malignant cells in my data, there are multiple VDJs. How is this possible?

I think knowing how the VDJs are assigned helps me to figure out the reason of this issue in my data.

Thanks, Sara

mourisl commented 1 year ago

If you just want to get the SHM rate, you can inifer from the v_identity column in the barcode_airr.tsv file. This flie also contains the alignment of the contig to the germline sequence, which can be used to identify base-level SHMs. There are packages like Platypus for sinigle-cell SHM analysis.

saramoein372 commented 1 year ago

Thank you Li.

For one of my samples, that I am running trust4, I am getting this error, but I am not sure where this error is coming from. Do you have any comments?

sh: line 1: 72365 Aborted /athena/namlab/scratch/sam4032/trust/bin/annotator -f /athena/namlab/scratch/sam4032/HL6/human_IMGT+C.fa -a out_s1s2/FR2_s1s2_final.out -t 8 -o out_s1

s2/FR2_s1s2 --barcode --UMI -r out_s1s2/FR2_s1s2_assembled_reads.fa > out_s1s2/FR2_s1s2_annot.fa

system /athena/namlab/scratch/sam4032/trust/bin/annotator -f /athena/namlab/scratch/sam4032/HL6/human_IMGT+C.fa -a out_s1s2/FR2_s1s2_final.out -t 8 -o out_s1s2/FR2s1s2 --barcode --UMI -r out

s1s2/FR2_s1s2_assembled_reads.fa > out_s1s2/FR2_s1s2_annot.fa failed: 34304 at /athena/namlab/scratch/sam4032/trust/bin/run-trust4 line 48.

Thank you,

Sara

On Fri, Dec 23, 2022 at 9:25 AM Li Song @.***> wrote:

If you just want to get the SHM rate, you can inifer from the v_identity column in the barcode_airr.tsv file. This flie also contains the alignment of the contig to the germline sequence, which can be used to identify base-level SHMs. There are packages like Platypus for sinigle-cell SHM analysis.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1363995306, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONXNX7OOUFEYUBTNS4LWOWY5NANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

saramoein372 commented 1 year ago

Hi Li,

For one of my samples, I am getting the below error. Do you know what is the issue?

Error in `/athena/namlab/scratch/sam4032/trust/bin/annotator': free(): invalid pointer: 0x000055913b51d7a0

======= Backtrace: =========

/lib64/libc.so.6(+0x81329)[0x7faafb2a1329]

/athena/namlab/scratch/sam4032/trust/bin/annotator(+0x6f05)[0x558f51cb7f05]

/athena/namlab/scratch/sam4032/trust/bin/annotator(+0xc2d9)[0x558f51cbd2d9]

/athena/namlab/scratch/sam4032/trust/bin/annotator(+0x4a8e)[0x558f51cb5a8e]

/lib64/libc.so.6(__libc_start_main+0xf5)[0x7faafb242555]

/athena/namlab/scratch/sam4032/trust/bin/annotator(+0x5b15)[0x558f51cb6b15]

======= Memory map: ========

558f51cb1000-558f51cb3000 r--p 00000000 58b:914c8 144130915658908237 /athena/namlab/scratch/sam4032/trust/bin/annotator

558f51cb3000-558f51cd3000 r-xp 00002000 58b:914c8 144130915658908237 /athena/namlab/scratch/sam4032/trust/bin/annotator

558f51cd3000-558f51cd6000 r--p 00022000 58b:914c8 144130915658908237 /athena/namlab/scratch/sam4032/trust/bin/annotator

558f51cd7000-558f51cd8000 r--p 00025000 58b:914c8 144130915658908237 /athena/namlab/scratch/sam4032/trust/bin/annotator

558f51cd8000-558f51cd9000 rw-p 00026000 58b:914c8 144130915658908237 /athena/namlab/scratch/sam4032/trust/bin/annotator

558f51cd9000-558f51cf9000 rw-p 00000000 00:00 0

558f53a0e000-55913bdb9000 rw-p 00000000 00:00 0 [heap]

7fa7c5ff9000-7fa945ffa000 rw-p 00000000 00:00 0

7faa65ffc000-7faa95ffd000 rw-p 00000000 00:00 0

7faac0000000-7faac02cd000 rw-p 00000000 00:00 0

7faac02cd000-7faac4000000 ---p 00000000 00:00 0

7faac4000000-7faac435d000 rw-p 00000000 00:00 0

7faac435d000-7faac8000000 ---p 00000000 00:00 0

7faac8000000-7faac8402000 rw-p 00000000 00:00 0

7faac8402000-7faacc000000 ---p 00000000 00:00 0

7faacc000000-7faacc569000 rw-p 00000000 00:00 0

7faacc569000-7faad0000000 ---p 00000000 00:00 0

7faad4000000-7faad4932000 rw-p 00000000 00:00 0

7faad4932000-7faad8000000 ---p 00000000 00:00 0

7faad8000000-7faad8655000 rw-p 00000000 00:00 0

7faad8655000-7faadc000000 ---p 00000000 00:00 0

7faadc000000-7faadc6ff000 rw-p 00000000 00:00 0

7faadc6ff000-7faae0000000 ---p 00000000 00:00 0

7faae4000000-7faae4b28000 rw-p 00000000 00:00 0

7faae4b28000-7faae8000000 ---p 00000000 00:00 0

7faae8b20000-7faae8b21000 ---p 00000000 00:00 0

7faae8b21000-7faae9321000 rw-p 00000000 00:00 0

7faae9321000-7faae9322000 ---p 00000000 00:00 0

7faae9322000-7faae9b22000 rw-p 00000000 00:00 0

7faae9b22000-7faae9b23000 ---p 00000000 00:00 0

7faae9b23000-7faaea323000 rw-p 00000000 00:00 0

7faaea323000-7faaea324000 ---p 00000000 00:00 0

7faaea324000-7faafaf1e000 rw-p 00000000 00:00 0

7faafaf1e000-7faafb01f000 r-xp 00000000 fd:00 35463 /usr/lib64/libm-2.17.so

7faafb01f000-7faafb21e000 ---p 00101000 fd:00 35463 /usr/lib64/libm-2.17.so

7faafb21e000-7faafb21f000 r--p 00100000 fd:00 35463 /usr/lib64/libm-2.17.so

7faafb21f000-7faafb220000 rw-p 00101000 fd:00 35463 /usr/lib64/libm-2.17.so

7faafb220000-7faafb3e4000 r-xp 00000000 fd:00 22713 /usr/lib64/libc-2.17.so

7faafb3e4000-7faafb5e3000 ---p 001c4000 fd:00 22713 /usr/lib64/libc-2.17.so

7faafb5e3000-7faafb5e7000 r--p 001c3000 fd:00 22713 /usr/lib64/libc-2.17.so

7faafb5e7000-7faafb5e9000 rw-p 001c7000 fd:00 22713 /usr/lib64/libc-2.17.so

7faafb5e9000-7faafb5ee000 rw-p 00000000 00:00 0

7faafb5ee000-7faafb605000 r-xp 00000000 fd:00 35487 /usr/lib64/libpthread-2.17.so

7faafb605000-7faafb804000 ---p 00017000 fd:00 35487 /usr/lib64/libpthread-2.17.so

7faafb804000-7faafb805000 r--p 00016000 fd:00 35487 /usr/lib64/libpthread-2.17.so

7faafb805000-7faafb806000 rw-p 00017000 fd:00 35487 /usr/lib64/libpthread-2.17.so

7faafb806000-7faafb80a000 rw-p 00000000 00:00 0

7faafb80a000-7faafb82c000 r-xp 00000000 fd:00 22706 /usr/lib64/ld-2.17.so

7faafb82d000-7faafb833000 rw-p 00000000 00:00 0

7faafb833000-7faafb837000 r--p 00000000 58b:914c8 144130915658905510 /athena/namlab/scratch/sam4032/trust/lib/libgcc_s.so.1

7faafb837000-7faafb847000 r-xp 00004000 58b:914c8 144130915658905510 /athena/namlab/scratch/sam4032/trust/lib/libgcc_s.so.1

7faafb847000-7faafb84a000 r--p 00014000 58b:914c8 144130915658905510 /athena/namlab/scratch/sam4032/trust/lib/libgcc_s.so.1

7faafb84a000-7faafb84b000 r--p 00016000 58b:914c8 144130915658905510 /athena/namlab/scratch/sam4032/trust/lib/libgcc_s.so.1

7faafb84b000-7faafb84c000 rw-p 00017000 58b:914c8 144130915658905510 /athena/namlab/scratch/sam4032/trust/lib/libgcc_s.so.1

7faafb84c000-7faafb8f2000 r--p 00000000 58b:914c8 144130915658905495 /athena/namlab/scratch/sam4032/trust/lib/libstdc++.so.6.0.30

7faafb8f2000-7faafb986000 r-xp 000a6000 58b:914c8 144130915658905495 /athena/namlab/scratch/sam4032/trust/lib/libstdc++.so.6.0.30

7faafb986000-7faafb9ee000 r--p 0013a000 58b:914c8 144130915658905495 /athena/namlab/scratch/sam4032/trust/lib/libstdc++.so.6.0.30

7faafb9ee000-7faafb9f9000 r--p 001a1000 58b:914c8 144130915658905495 /athena/namlab/scratch/sam4032/trust/lib/libstdc++.so.6.0.30

7faafb9f9000-7faafb9fd000 rw-p 001ac000 58b:914c8 144130915658905495 /athena/namlab/scratch/sam4032/trust/lib/libstdc++.so.6.0.30

7faafb9fd000-7faafba00000 rw-p 00000000 00:00 0

7faafba00000-7faafba03000 r--p 00000000 58b:914c8 144130915658905536 /athena/namlab/scratch/sam4032/trust/lib/libz.so.1.2.13

7faafba03000-7faafba11000 r-xp 00003000 58b:914c8 144130915658905536 /athena/namlab/scratch/sam4032/trust/lib/libz.so.1.2.13

7faafba11000-7faafba18000 r--p 00011000 58b:914c8 144130915658905536 /athena/namlab/scratch/sam4032/trust/lib/libz.so.1.2.13

7faafba18000-7faafba19000 r--p 00017000 58b:914c8 144130915658905536 /athena/namlab/scratch/sam4032/trust/lib/libz.so.1.2.13

7faafba19000-7faafba1a000 rw-p 00018000 58b:914c8 144130915658905536 /athena/namlab/scratch/sam4032/trust/lib/libz.so.1.2.13

7faafba1a000-7faafba1b000 rw-p 00000000 00:00 0

7faafba23000-7faafba2b000 rw-p 00000000 00:00 0

7faafba2b000-7faafba2c000 r--p 00021000 fd:00 22706 /usr/lib64/ld-2.17.so

7faafba2c000-7faafba2d000 rw-p 00022000 fd:00 22706 /usr/lib64/ld-2.17.so

7faafba2d000-7faafba2e000 rw-p 00000000 00:00 0

7fff87883000-7fff878a5000 rw-p 00000000 00:00 0 [stack]

7fff879ca000-7fff879cc000 r-xp 00000000 00:00 0 [vdso]

ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

sh: line 1: 237973 Aborted /athena/namlab/scratch/sam4032/trust/bin/annotator -f /athena/namlab/scratch/sam4032/HL6/human_IMGT+C.fa -a out_s2/FR2_s2_final.out -t 8 -o out_s2/FR2_s2 --barcode --UMI -r out_s2/FR2_s2_assembled_read

s.fa > out_s2/FR2_s2_annot.fa

system /athena/namlab/scratch/sam4032/trust/bin/annotator -f /athena/namlab/scratch/sam4032/HL6/human_IMGT+C.fa -a out_s2/FR2_s2_final.out -t 8 -o out_s2/FR2_s2 --barcode --UMI -r out_s2/FR2_s2_assembled_reads.fa > out_s2/FR2_s2_annot.fa failed

: 34304 at /athena/namlab/scratch/sam4032/trust/bin/run-trust4 line 48.

On Fri, Dec 23, 2022 at 3:58 PM Sara Moien @.***> wrote:

Thank you Li.

For one of my samples, that I am running trust4, I am getting this error, but I am not sure where this error is coming from. Do you have any comments?

sh: line 1: 72365 Aborted /athena/namlab/scratch/sam4032/trust/bin/annotator -f /athena/namlab/scratch/sam4032/HL6/human_IMGT+C.fa -a out_s1s2/FR2_s1s2_final.out -t 8 -o out_s1

s2/FR2_s1s2 --barcode --UMI -r out_s1s2/FR2_s1s2_assembled_reads.fa > out_s1s2/FR2_s1s2_annot.fa

system /athena/namlab/scratch/sam4032/trust/bin/annotator -f /athena/namlab/scratch/sam4032/HL6/human_IMGT+C.fa -a out_s1s2/FR2_s1s2_final.out -t 8 -o out_s1s2/FR2s1s2 --barcode --UMI -r out

s1s2/FR2_s1s2_assembled_reads.fa > out_s1s2/FR2_s1s2_annot.fa failed: 34304 at /athena/namlab/scratch/sam4032/trust/bin/run-trust4 line 48.

Thank you,

Sara

On Fri, Dec 23, 2022 at 9:25 AM Li Song @.***> wrote:

If you just want to get the SHM rate, you can inifer from the v_identity column in the barcode_airr.tsv file. This flie also contains the alignment of the contig to the germline sequence, which can be used to identify base-level SHMs. There are packages like Platypus for sinigle-cell SHM analysis.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1363995306, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONXNX7OOUFEYUBTNS4LWOWY5NANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

This could be a bug in the annotator program. Which version of TRUST4 are you using?

saramoein372 commented 1 year ago

How I can check the version?

On Wed, Dec 28, 2022, 11:37 AM Li Song @.***> wrote:

This could be a bug in the annotator program. Which version of TRUST4 are you using?

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1366779280, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONRKSA2XC3ZL6ESHEH3WPRUDDANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

You can directly run run-trust4 to output the help message, and the version should be at the first line.

saramoein372 commented 1 year ago

The version is:

TRUST4 v1.0.6

On Wed, Dec 28, 2022 at 11:50 AM Li Song @.***> wrote:

You can directly run run-trust4 to output the help message, and the version should be at the first line.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1366787086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONVLM4DGJHAKW2U7WO3WPRVUPANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

There are several bugs fixed afterward, could you please try the newest version (v1.0.8)?

saramoein372 commented 1 year ago

How I can upgrade?

On Wed, Dec 28, 2022 at 11:57 AM Li Song @.***> wrote:

There are several bugs fixed afterward, could you please try the newest version (v1.0.8)?

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1366792146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONRKTF2CKI7U7YFOG6LWPRWN3ANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

How did you install it at the beginning?

saramoein372 commented 1 year ago

I used conda. Now I tried conda update trust4 It upgraded.

Thanks

On Wed, Dec 28, 2022 at 12:11 PM Li Song @.***> wrote:

How did you install it at the beginning?

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1366801059, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONSM5IXBM2FNYJROJCTWPRYC7ANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

saramoein372 commented 1 year ago

I tried after updating trust4. Still getting the same error.

Should I define the version of trust4-run in my code?

Not sure what is the issue.

On Wed, Dec 28, 2022 at 12:34 PM Sara Moien @.***> wrote:

I used conda. Now I tried conda update trust4 It upgraded.

Thanks

On Wed, Dec 28, 2022 at 12:11 PM Li Song @.***> wrote:

How did you install it at the beginning?

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1366801059, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONSM5IXBM2FNYJROJCTWPRYC7ANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

saramoein372 commented 1 year ago

I had multiple successful runs. What can be the reason that this sample generates error?

Thanks!

On Wed, Dec 28, 2022 at 4:43 PM Sara Moien @.***> wrote:

I tried after updating trust4. Still getting the same error.

Should I define the version of trust4-run in my code?

Not sure what is the issue.

On Wed, Dec 28, 2022 at 12:34 PM Sara Moien @.***> wrote:

I used conda. Now I tried conda update trust4 It upgraded.

Thanks

On Wed, Dec 28, 2022 at 12:11 PM Li Song @.***> wrote:

How did you install it at the beginning?

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1366801059, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONSM5IXBM2FNYJROJCTWPRYC7ANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

Could you please share the out_s2/FR2_s2_final.out with me? This could be a bug in TRUST4, and I can look into it.

saramoein372 commented 1 year ago

Hi Li. Please see attachment FR2_s2_cdr3.out https://drive.google.com/file/d/110WHnzonboBgIaTAa0fURc_xJ0XQplDB/view?usp=drive_web

On Wed, Dec 28, 2022 at 5:03 PM Li Song @.***> wrote:

Could you please share the out_s2/FR2_s2_final.out with me? This could be a bug in TRUST4, and I can look into it.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1366950326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONTCRG654OUY4D7E2JTWPS2JLANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

Thank you for sharing the file. Could you please the _final.out file, which is the input to the annotator program?

saramoein372 commented 1 year ago

FR2_s2_final.out https://drive.google.com/file/d/1KD4nre_3fNPcehBe3mtAWiw4fQWOYSZj/view?usp=drive_web

On Thu, Dec 29, 2022 at 1:38 PM Li Song @.***> wrote:

Thank you for sharing the file. Could you please the _final.out file, which is the input to the annotator program?

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1367514283, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONXUJJQC6AAA3MVRWQDWPXLBTANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

It seems the contig file works fine on our server. Just want to confirm, could you please run "/athena/namlab/scratch/sam4032/trust/bin/annotator -f /athena/namlab/scratch/sam4032/HL6/human_IMGT+C.fa -a out_s1s2/FR2_s1s2_final.out -t 8" without the -r, --barcode, --UMI options to test? If it works fine on your system, could you please share the out_s2/FR2_s2_assembled_read with me?

saramoein372 commented 1 year ago

Thanks. Sure. Just to make sure if understand:

1- Are you saying something is wrong with my infrastructure for running the code? What can be the reasons that I can not run this in our server? Memory issue? Any comments?

2- Are you suggesting me to run the below command? If it is not correct, would you please fix it?

run-trust4 -f /athena/namlab/scratch/sam4032/HL6/hg38_bcrtcr.fa -t 8 --ref /athena/namlab/scratch/sam4032/HL6/human_IMGT+C.fa -u /athena/namlab/scratch/sam4032/HL6/HL_6_S2_Tcells_Bcells_unsorted_BCR_S1_L001_R2_001.fastq.gz --barcode HL_6_S2_Tcells_Bcells_unsorted_BCR_S1_L001_R1_001.fastq.gz -o FR2_s2 --od out_s2 --repseq

On Fri, Dec 30, 2022 at 11:49 AM Li Song @.***> wrote:

It seems the contig file works fine on our server. Just want to confirm, could you please run "/athena/namlab/scratch/sam4032/trust/bin/annotator -f /athena/namlab/scratch/sam4032/HL6/human_IMGT+C.fa -a out_s1s2/FR2_s1s2_final.out -t 8" without the -r, --barcode, --UMI options to test? If it works fine on your system, could you please share the out_s2/FR2_s2_assembled_read with me?

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1368007900, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONWX6SW5N3R6LFT7LATWP4HC7ANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago
  1. I just want to make sure the error happens in the annotation step, and is not happened in the abundance estimation (which needs the reads from the -r option).

  2. Since the crash happens on the "annotator" program, you can just run that step and manually change the parameters.

If it is more convenient, you can directly share the out_s2/FR2_s2_assembled_read.fa with me. Thanks.

saramoein372 commented 1 year ago

Sure, I will send you out_s2/FR2_s2_assembled_read.fa soon.

Would you please help me to generate the FR2_s1_barcode_airr.tsv, FR2_s1_barcode_report.tsv, FR2_s1_report.tsv? Currently, I can only generate below files, but some of them are not generated:

-rw-r--r-- 1 sam4032 namlab 593135 Dec 23 18:14 FR2_s1_airr.tsv

-rw-r--r-- 1 sam4032 namlab 123678530 Dec 28 16:14 FR2_s2_annot.fa

-rw-r--r-- 1 sam4032 namlab 6438561630 Dec 28 15:58 FR2_s2_assembled_reads.fa

-rw-r--r-- 1 sam4032 namlab 11702 Dec 28 16:14 FR2_s2_cdr3.out

-rw-r--r-- 1 sam4032 namlab 504985239 Dec 28 15:58 FR2_s2_final.out

-rw-r--r-- 1 sam4032 namlab 504985239 Dec 28 15:57 FR2_s2_raw.out

-rw-r--r-- 1 sam4032 namlab 2163394396 Dec 28 13:00 FR2_s2_toassemble_bc.fa

-rw-r--r-- 1 sam4032 namlab 8453311113 Dec 28 13:00 FR2_s2_toassemble.fq

-rw-r--r-- 1 sam4032 namlab 2013567735 Dec 28 13:00 FR2_s2_toassemble_umi.fa

FR2_s2_assembled_reads.fa https://drive.google.com/file/d/1HY27FhDje0og_J9LvmRqXrVS92lB0WP2/view?usp=drive_web

On Fri, Dec 30, 2022 at 4:27 PM Li Song @.***> wrote:

1.

I just want to make sure the error happens in the annotation step, and is not happened in the abundance estimation (which needs the reads from the -r option). 2.

Since the crash happens on the "annotator" program, you can just run that step and manually change the parameters.

If it is more convenient, you can directly share the out_s2/FR2_s2_assembled_read.fa with me. Thanks.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1368095364, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONR47FU6366JI3WQJELWP5HTNANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

saramoein372 commented 1 year ago

If I remember, you said you had no issue to generate "FR2_s1_report.tsv". Is that correcT?

If it is correct, can you share it with me?

Thank you! Sara

On Tue, Jan 3, 2023 at 10:35 AM Sara Moien @.***> wrote:

Sure, I will send you out_s2/FR2_s2_assembled_read.fa soon.

Would you please help me to generate the FR2_s1_barcode_airr.tsv, FR2_s1_barcode_report.tsv, FR2_s1_report.tsv? Currently, I can only generate below files, but some of them are not generated:

-rw-r--r-- 1 sam4032 namlab 593135 Dec 23 18:14 FR2_s1_airr.tsv

-rw-r--r-- 1 sam4032 namlab 123678530 Dec 28 16:14 FR2_s2_annot.fa

-rw-r--r-- 1 sam4032 namlab 6438561630 Dec 28 15:58 FR2_s2_assembled_reads.fa

-rw-r--r-- 1 sam4032 namlab 11702 Dec 28 16:14 FR2_s2_cdr3.out

-rw-r--r-- 1 sam4032 namlab 504985239 Dec 28 15:58 FR2_s2_final.out

-rw-r--r-- 1 sam4032 namlab 504985239 Dec 28 15:57 FR2_s2_raw.out

-rw-r--r-- 1 sam4032 namlab 2163394396 Dec 28 13:00 FR2_s2_toassemble_bc.fa

-rw-r--r-- 1 sam4032 namlab 8453311113 Dec 28 13:00 FR2_s2_toassemble.fq

-rw-r--r-- 1 sam4032 namlab 2013567735 Dec 28 13:00 FR2_s2_toassemble_umi.fa

FR2_s2_assembled_reads.fa https://drive.google.com/file/d/1HY27FhDje0og_J9LvmRqXrVS92lB0WP2/view?usp=drive_web

On Fri, Dec 30, 2022 at 4:27 PM Li Song @.***> wrote:

1.

I just want to make sure the error happens in the annotation step, and is not happened in the abundance estimation (which needs the reads from the -r option). 2.

Since the crash happens on the "annotator" program, you can just run that step and manually change the parameters.

If it is more convenient, you can directly share the out_s2/FR2_s2_assembled_read.fa with me. Thanks.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1368095364, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONR47FU6366JI3WQJELWP5HTNANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

Do you mean FR2_s2? Can you access this file https://www.dropbox.com/s/o97pishplcf5m92/FR2_s2.zip?dl=0 ?

saramoein372 commented 1 year ago

Thank you so much!

So, what can be the possible reasons that my infrastructure is not supporting the current code?

Is it related to the compiler?

On Tue, Jan 3, 2023 at 12:19 PM Li Song @.***> wrote:

Do you mean FR2_s2? Can you access this file https://www.dropbox.com/s/o97pishplcf5m92/FR2_s2.zip?dl=0 ?

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1370029772, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONU3WPREGH5QYISCYBTWQRNSRANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

It could be the happens in the stage of utilizing the assembled reads in the annotator program. That's why I request that file. When you pulled the trust4's newest version, have you run "make clean; make" to ensure that the older executables were purged?

saramoein372 commented 1 year ago

Thanks. No I did not. I do it.

Another question: why the format of report.tsv in FR2_S2 is different from all other freeport.tsv file I generated before. I expect the frequency and count columns but I can't see them in the FR2_s2_report.tsv file Is there any way I can have the frequency? Because I am getting multiple contigs for one cell IDs and I am going to filter based on the maximum frequency.

Any suggestions?

Thank you!

On Tue, Jan 3, 2023 at 1:30 PM Li Song @.***> wrote:

It could be the happens in the stage of utilizing the assembled reads in the annotator program. That's why I request that file. When you pulled the trust4's newest version, have you run "make clean; make" to ensure that the older executables were purged?

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1370094851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONX5UHFO4RN4A7HN3ATWQRV35ANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

That was a typo when I run your files. It should be in the right format now: https://www.dropbox.com/s/k2p89g2tkvilh5h/FR2_s2_report.tsv?dl=0

saramoein372 commented 1 year ago

perfect! Thank you Li!

On Tue, Jan 3, 2023 at 1:49 PM Li Song @.***> wrote:

That was a typo when I run your files. It should be in the right format now: https://www.dropbox.com/s/k2p89g2tkvilh5h/FR2_s2_report.tsv?dl=0

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1370111482, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONUMYAJMLLJSHLAA5FLWQRYD3ANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

saramoein372 commented 1 year ago

Li, would you please remove my data after you found the bug of trust4? and appreciate not sharing it. Thanks!

Sara

On Tue, Jan 3, 2023 at 2:04 PM Sara Moien @.***> wrote:

perfect! Thank you Li!

On Tue, Jan 3, 2023 at 1:49 PM Li Song @.***> wrote:

That was a typo when I run your files. It should be in the right format now: https://www.dropbox.com/s/k2p89g2tkvilh5h/FR2_s2_report.tsv?dl=0

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1370111482, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONUMYAJMLLJSHLAA5FLWQRYD3ANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

Don't worry. I won't share the data.

saramoein372 commented 1 year ago

Thanks Li. I have two questions: 1- were you successful to solve the bug of running out_s2 sample with trust4?

2- Can you introduce the best tool for generating the phylogenetic tree after generating the clones with TRUST4?

Thanks, Sara

On Tue, Jan 3, 2023 at 4:54 PM Li Song @.***> wrote:

Don't worry. I won't share the data.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1370263312, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONUVNFRKP25UG2G5Z43WQSN2XANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

saramoein372 commented 1 year ago

Hi Li,

One question I have: in running my data , I can see some of my normal samples, do not have any data on IGH chain. How can it be possible? This my command:

run-trust4 -f hg38_bcrtcr.fa -t 8 --ref human_IMGT+C.fa -u HL_6_S1_HRS_Bcells_BCR_S11_L001_R2_001.fastq.gz --barcode HL_6_S1_HRS_Bcells_BCR_S11_L001_R1_001.fastq.gz --barcodeRange 0 15 + --barcodeWhitelist 737K-august-2016_barcodes.txt --UMI HL_6_S1_HRS_Bcells_BCR_S11_L001_R1_001.fastq.gz --umiRange 16 27 + -o FR2_s1 --od out_s1 --repseq

Thanks, Sara

On Thu, Jan 5, 2023 at 12:44 PM Sara Moien @.***> wrote:

Thanks Li. I have two questions: 1- were you successful to solve the bug of running out_s2 sample with trust4?

2- Can you introduce the best tool for generating the phylogenetic tree after generating the clones with TRUST4?

Thanks, Sara

On Tue, Jan 3, 2023 at 4:54 PM Li Song @.***> wrote:

Don't worry. I won't share the data.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1370263312, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONUVNFRKP25UG2G5Z43WQSN2XANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago
  1. I don't think I have received the assembled read file to reproduce the error you had. Which email did you send it to?

  2. Immancation package has comprehensive methods for BCR phylogeny analysis.

  3. The command looks right to me. Could you please check whether the barcodes are in the whitelist? One way is to check the XXX_toassemble_bc.fa file to see whether all of them are marked as "missing_barcode".

saramoein372 commented 1 year ago

Thanks Li. I sent that file few days ago. Bur I share it again tomorrow. One more question:

In the file cdr3.out I can see data from all 3 chains. But when did ryn python-cluster.py code to generate the clusters, I can see that clusters are only on igk and I'll. But no cluster on igh. Why the igh clusters are not appearing in the output of python-cluster.py? The input I use for clustering is the cdr3.out.

On Thu, Jan 5, 2023, 9:42 PM Li Song @.***> wrote:

1.

I don't think I have received the assembled read file to reproduce the error you had. Which email did you send it to? 2.

Immancation package has comprehensive methods for BCR phylogeny analysis. 3.

The command looks right to me. Could you please check whether the barcodes are in the whitelist? One way is to check the XXX_toassemble_bc.fa file to see whether all of them are marked as "missing_barcode".

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1373069577, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONVRF4IB5RRYFJ325M3WQ6BATANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

saramoein372 commented 1 year ago

Hi Li, The file assembeled_reads.fa is attached. FR2_s2_assembled_reads.fa https://drive.google.com/file/d/19-QlWXWyPdnfwm9LJ1TG2F29zCdwE3Tb/view?usp=drive_web

On Thu, Jan 5, 2023 at 10:09 PM Sara Moien @.***> wrote:

Thanks Li. I sent that file few days ago. Bur I share it again tomorrow. One more question:

In the file cdr3.out I can see data from all 3 chains. But when did ryn python-cluster.py code to generate the clusters, I can see that clusters are only on igk and I'll. But no cluster on igh. Why the igh clusters are not appearing in the output of python-cluster.py? The input I use for clustering is the cdr3.out.

On Thu, Jan 5, 2023, 9:42 PM Li Song @.***> wrote:

1.

I don't think I have received the assembled read file to reproduce the error you had. Which email did you send it to? 2.

Immancation package has comprehensive methods for BCR phylogeny analysis. 3.

The command looks right to me. Could you please check whether the barcodes are in the whitelist? One way is to check the XXX_toassemble_bc.fa file to see whether all of them are marked as "missing_barcode".

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1373069577, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONVRF4IB5RRYFJ325M3WQ6BATANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

saramoein372 commented 1 year ago

Li, I have one question: To visualize my BCR clone, I need a fasta file can be loaded to get all AncesTree features available.The file should contain the UCA (Unmutated Common Ancestor) sequence in IMGT format and its related CDR/FR regions boundaries.

Does TRUST4 generate any file with this specification? Thanks!

On Fri, Jan 6, 2023 at 10:01 AM Sara Moien @.***> wrote:

Hi Li, The file assembeled_reads.fa is attached. FR2_s2_assembled_reads.fa https://drive.google.com/file/d/19-QlWXWyPdnfwm9LJ1TG2F29zCdwE3Tb/view?usp=drive_web

On Thu, Jan 5, 2023 at 10:09 PM Sara Moien @.***> wrote:

Thanks Li. I sent that file few days ago. Bur I share it again tomorrow. One more question:

In the file cdr3.out I can see data from all 3 chains. But when did ryn python-cluster.py code to generate the clusters, I can see that clusters are only on igk and I'll. But no cluster on igh. Why the igh clusters are not appearing in the output of python-cluster.py? The input I use for clustering is the cdr3.out.

On Thu, Jan 5, 2023, 9:42 PM Li Song @.***> wrote:

1.

I don't think I have received the assembled read file to reproduce the error you had. Which email did you send it to? 2.

Immancation package has comprehensive methods for BCR phylogeny analysis. 3.

The command looks right to me. Could you please check whether the barcodes are in the whitelist? One way is to check the XXX_toassemble_bc.fa file to see whether all of them are marked as "missing_barcode".

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1373069577, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONVRF4IB5RRYFJ325M3WQ6BATANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

Probably you can start from the AIRR output format from TRUST4. If the downstream tool requires the IMGT "." gap to match the coordinate, you can use the "airr-imgtgap.py" script in the "script" folder.

saramoein372 commented 1 year ago

Thanks Li. Can I ask what this code does exactly?

I can see in the AIRR.tsv file, the sequence_alignment and germline_alignment are empty.

Why that is? How I should fix it?

On Fri, Jan 6, 2023 at 3:15 PM Li Song @.***> wrote:

Probably you can start from the AIRR output format from TRUST4. If the downstream tool requires the IMGT "." gap to match the coordinate, you can use the "airr-imgtgap.py" script in the "script" folder.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1374081581, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONS7CUOFRYA3AZEQYRLWRB4P5ANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

I guess your TRUST4's version is still the old one, the new version should have sequences in those two columns.

saramoein372 commented 1 year ago

O, maybe. Because I think I changed the version recently. And some samples were run by the older version. Thanks

On Fri, Jan 6, 2023, 6:26 PM Li Song @.***> wrote:

I guess your TRUST4's version is still the old one, the new version should have sequences in those two columns.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1374251816, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONSJOVYFGGCQ7I5H6S3WRCS3NANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

saramoein372 commented 1 year ago

Hi Li,

I have two questions:

1- I am using cdr3.out file as the input of python-cluster.py (in the github) to generate the BCR clusters. But some of the rows in cdr3.out are not included in the output of python-cluster.py. Why is that? Why some of the rows of cdr3.out are not clustered in the clustering code and are filtered out?

2- I rerun the code for one of my samples to generate the aligned_germline and aligned_sequence columns in the AIRR.tsv file. I Can see these columns have value now, which is great. But again, why some of the rows in the outcome of clustering code (python-cluster.py) are not in the rows of file AIRR.tsv file? I want to extract the germline and sequence of all of the rows in the clustered cells. But most of them are not in the file AIRR.tsv. Why is that? Is there any reason for this? and is thre any solution to extract the germline for all of the cells in python-cluster's output?

Thank you so much!

Sara

On Fri, Jan 6, 2023 at 6:37 PM Sara Moien @.***> wrote:

O, maybe. Because I think I changed the version recently. And some samples were run by the older version. Thanks

On Fri, Jan 6, 2023, 6:26 PM Li Song @.***> wrote:

I guess your TRUST4's version is still the old one, the new version should have sequences in those two columns.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1374251816, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONSJOVYFGGCQ7I5H6S3WRCS3NANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago
  1. The trust-cluster.py only considers entries with both V and J genes and complete CDR3s.
  2. One reason that the entries in the clustering file are missing in the AIRR format is because the trust-cluster method is based on bulk data (or pseudo-bulk). The AIRR format on the single-cell data only contains the information for the dominant CDR3, so it is much cleaner. Therefore, you can ignore the entries in the cluster file that are not present in the AIRR file.
saramoein372 commented 1 year ago

Hi Li, Thanks. There was another file "airr_align.tsv" and I could extract the aligned_germline and aligned_sequence for all cells in my python-cluster.py output, from this file.

Related to "The trust-cluster.py only considers entries with both V and J genes and complete CDR3s", is it correct if I say: in my output I can't see any IGH row for the normal case, because the CDR3 length is not complete? In biological sense, is this correct that we have no complete CDR3 for IGH? My python-cluster.py's output only has rows for IGL and IGK. How I should justify this?

On Mon, Jan 9, 2023 at 12:52 AM Li Song @.***> wrote:

  1. The trust-cluster.py only considers entries with both V and J genes and complete CDR3s.
  2. One reason that the entries in the clustering file are missing in the AIRR format is because the trust-cluster method is based on bulk data (or pseudo-bulk). The AIRR format on the single-cell data only contains the information for the dominant CDR3, so it is much cleaner. Therefore, you can ignore the entries in the cluster file that are not present in the AIRR file.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1375134063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONQDTRNOTOATH3FFC6DWRORTJANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

It's because CDR3 length is not complete or we could not identify the V OR the J gene.

saramoein372 commented 1 year ago

Sorry, I am a little confused: But there are many rows in the file cdr3.out that have assigned V,D,J. But they are not appearing in the clustering output. What can be the reason?

On Mon, Jan 9, 2023 at 12:05 PM Li Song @.***> wrote:

It's because CDR3 length is not complete or we could not identify the V OR the J gene.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1375958537, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONVRATK37UAS4VROX3TWRRAL5ANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

Then they could all be partial CDR3. You can inspect the sequences in IMGT/v-quest or igblast to confirm whether this is the case.

saramoein372 commented 1 year ago

Sure. Thank you. This sample I am talking about, is a normal sample and I expect to see complete CDR3 on IGH chain. What can be the reason of partial CDR3? Is there any biological reason for that? I am trying to find justification for why there is no IGH in my final result for the normal sample.

On Mon, Jan 9, 2023 at 12:19 PM Li Song @.***> wrote:

Then they could all be partial CDR3. You can inspect the sequences in IMGT/v-quest or igblast to confirm whether this is the case.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1375980745, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONUWSOKKYGEVEXHCKULWRRCCXANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

You shall confirm whether these are indeed partial CDR3 first. If you obtain none of good quality IGHs, this could be the malfunction of the BCR-seq kit.

saramoein372 commented 1 year ago

How can I confirm if they are partial CDR3? or complete?

On Mon, Jan 9, 2023 at 12:30 PM Li Song @.***> wrote:

You shall confirm whether these are indeed partial CDR3 first. If you obtain none of good quality IGHs, this could be the malfunction of the BCR-seq kit.

— Reply to this email directly, view it on GitHub https://github.com/liulab-dfci/TRUST4/issues/169#issuecomment-1375995407, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONV3R5T3C5XYM7EESXTWRRDLXANCNFSM6AAAAAASNQNL5E . You are receiving this because you authored the thread.Message ID: @.***>

mourisl commented 1 year ago

You can inspect the sequences in IMGT/v-quest or igblast, as the independent source, to confirm whether this is the case.