Open liu9756 opened 8 months ago
The -f file hold the VDJ sequences from a reference genome along with their genomic coordinates at the header. The reference files for mouse can be found at: https://github.com/liulab-dfci/TRUST4/tree/master/mouse , where GRCm38_bcrtcr.fa is for -f, and the IMGT one is for --ref. Or do you need to use your own VDJ reference sequences?
Thanks for your reply. Actually I tried the GRCm38_bcrtcr.fa and IMGT , however it seems not work:
$ ./run-trust4 -b all_contig.bam -f GRCm38_bcrtcr.fa --ref mouse_IMGT+C.fa -o TRUST_all_contig_toassemble --barcode CB [Wed Mar 13 10:00:32 2024] TRUST4 begins. [Wed Mar 13 10:00:32 2024] SYSTEM CALL: /home/user/trust4/TRUST4/bam-extractor -b all_contig.bam -t 1 -f GRCm38_bcrtcr.fa -o TRUST_all_contig_toassemble_toassemble --barcode CB [Wed Mar 13 10:00:32 2024] Start to extract candidate reads from bam file. Unknown genome name: 6 system /home/user/trust4/TRUST4/bam-extractor -b all_contig.bam -t 1 -f GRCm38_bcrtcr.fa -o TRUST_all_contig_toassemble_toassemble --barcode CB failed: 256 at ./run-trust4 line 55.
Could you please show me the chromosome names of your bam file by "samtools view -H all_contig.bam"?
@HD VN:1.6 SO:coordinate @SQ SN:AAACCTGAGACGCACA-1_contig_1 LN:496 @SQ SN:AAACCTGAGATAGTCA-1_contig_1 LN:488 @SQ SN:AAACCTGAGCGTTCCG-1_contig_1 LN:510 @SQ SN:AAACCTGAGGACAGCT-1_contig_1 LN:464 @SQ SN:AAACCTGAGGACAGCT-1_contig_2 LN:521 @SQ SN:AAACCTGCAAGCGCTC-1_contig_1 LN:499 @SQ SN:AAACCTGCAAGCGCTC-1_contig_2 LN:656 @SQ SN:AAACCTGCAAGCGCTC-1_contig_3 LN:503 @SQ SN:AAACCTGCACAACGCC-1_contig_1 LN:551 @SQ SN:AAACCTGCACGCCAGT-1_contig_1 LN:508 @SQ SN:AAACCTGCACGGCGTT-1_contig_1 LN:492 @SQ SN:AAACCTGCAGCGTCCA-1_contig_1 LN:503 @SQ SN:AAACCTGCATTACGAC-1_contig_1 LN:517 @SQ SN:AAACCTGCATTGCGGC-1_contig_1 LN:551 @SQ SN:AAACCTGGTACCATCA-1_contig_1 LN:303 @SQ SN:AAACCTGGTACCATCA-1_contig_2 LN:460 @SQ SN:AAACCTGGTATAGTAG-1_contig_1 LN:496 @SQ SN:AAACCTGTCAGTGTTG-1_contig_1 LN:503 @SQ SN:AAACCTGTCCGTCAAA-1_contig_1 LN:496 @SQ SN:AAACCTGTCCTAAGTG-1_contig_1 LN:559 @SQ SN:AAACCTGTCCTAAGTG-1_contig_2 LN:498 @SQ SN:AAACCTGTCGGCGCTA-1_contig_1 LN:538 @SQ SN:AAACCTGTCGGCGCTA-1_contig_2 LN:498 @SQ SN:AAACCTGTCTCCCTGA-1_contig_1 LN:496 @SQ SN:AAACCTGTCTCCCTGA-1_contig_2 LN:373 @SQ SN:AAACCTGTCTCTAGGA-1_contig_1 LN:497 @SQ SN:AAACCTGTCTGCTTGC-1_contig_1 LN:495 @SQ SN:AAACGGGAGCTGCAAG-1_contig_1 LN:389 @SQ SN:AAACGGGAGTGTTGAA-1_contig_1 LN:590 @SQ SN:AAACGGGAGTGTTGAA-1_contig_2 LN:506 @SQ SN:AAACGGGCAAACCCAT-1_contig_1 LN:551 @SQ SN:AAACGGGCAGCTGCAC-1_contig_1 LN:538 @SQ SN:AAACGGGCAGGTGGAT-1_contig_1 LN:495 @SQ SN:AAACGGGCATTATCTC-1_contig_1 LN:512 @SQ SN:AAACGGGTCAACCAAC-1_contig_1 LN:497 @SQ SN:AAACGGGTCACAACGT-1_contig_1 LN:342 @SQ SN:AAACGGGTCAGAGCTT-1_contig_1 LN:493 @SQ SN:AAACGGGTCCACGTTC-1_contig_1 LN:527 @SQ SN:AAACGGGTCGTGGACC-1_contig_1 LN:309 @SQ SN:AAACGGGTCTAACTCT-1_contig_1 LN:428 @SQ SN:AAACGGGTCTTGTATC-1_contig_1 LN:565 @SQ SN:AAAGATGAGTTCGATC-1_contig_1 LN:538 @SQ SN:AAAGATGCAAGAGTCG-1_contig_1 LN:467 @SQ SN:AAAGATGCAAGGTTTC-1_contig_1 LN:551 @SQ SN:AAAGATGCAGATGAGC-1_contig_1 LN:684 @SQ SN:AAAGATGCAGATGAGC-1_contig_2 LN:512 @SQ SN:AAAGATGCATGAACCT-1_contig_1 LN:512 @SQ SN:AAAGATGGTAAATGAC-1_contig_1 LN:501 @SQ SN:AAAGATGGTATCTGCA-1_contig_1 LN:620 @SQ SN:AAAGATGGTCACACGC-1_contig_1 LN:512 @SQ SN:AAAGATGTCAAACGGG-1_contig_1 LN:503 @SQ SN:AAAGATGTCCCACTTG-1_contig_1 LN:559 @SQ SN:AAAGATGTCCCACTTG-1_contig_2 LN:410 @SQ SN:AAAGATGTCGGGAGTA-1_contig_1 LN:481 @SQ SN:AAAGATGTCTTGTCAT-1_contig_1 LN:493 @SQ SN:AAAGCAAAGAAGATTC-1_contig_1 LN:495 @SQ SN:AAAGCAAAGCCACGTC-1_contig_1 LN:495 @SQ SN:AAAGCAAAGGAGTACC-1_contig_1 LN:521 @SQ SN:AAAGCAAAGTGCCAGA-1_contig_1 LN:512 @SQ SN:AAAGCAACAGCCTATA-1_contig_1 LN:521 @SQ SN:AAAGCAAGTAAGTTCC-1_contig_1 LN:505 @SQ SN:AAAGCAAGTTGTGGCC-1_contig_1 LN:627 @SQ SN:AAAGCAATCACATGCA-1_contig_1 LN:501 @SQ SN:AAAGCAATCAGGCCCA-1_contig_1 LN:504 @SQ SN:AAAGCAATCCCTAATT-1_contig_1 LN:532 @SQ SN:AAAGCAATCCTGTACC-1_contig_1 LN:512 @SQ SN:AAAGCAATCGCCTGTT-1_contig_1 LN:494 @SQ SN:AAAGCAATCTGAGTGT-1_contig_1 LN:503 @SQ SN:AAAGTAGAGACTACAA-1_contig_1 LN:559 @SQ SN:AAAGTAGAGACTAGGC-1_contig_1 LN:510 @SQ SN:AAAGTAGAGAGTGAGA-1_contig_1 LN:494 @SQ SN:AAAGTAGAGATCACGG-1_contig_1 LN:500 @SQ SN:AAAGTAGAGCGTGAAC-1_contig_1 LN:407 @SQ SN:AAAGTAGAGGCATGGT-1_contig_1 LN:684 @SQ SN:AAAGTAGAGGCATGGT-1_contig_2 LN:512 @SQ SN:AAAGTAGCAATAACGA-1_contig_1 LN:512 @SQ SN:AAAGTAGCACCAGGCT-1_contig_1 LN:567 @SQ SN:AAAGTAGCACGTCAGC-1_contig_1 LN:561 ......
@PG ID:samtools PN:samtools VN:1.16.1 CL:samtools sort -l 8G -m 600M -o /home/user/referenceData/run_vdj_S5/SC_VDJ_ASSEMBLER_CS/SC_MULTI_CORE/MULTI_GEM_WELL_PROCESSOR/VDJ_B_GEM_WELL_PROCESSOR/SC_VDJ_CONTIG_ASSEMBLER/ASSEMBLE_VDJ/fork0/chnk00-uf0fee24a37/files/contig_bam_sorted.bam /home/user/referenceData/run_vdj_S5/SC_VDJ_ASSEMBLER_CS/SC_MULTI_CORE/MULTI_GEM_WELL_PROCESSOR/VDJ_B_GEM_WELL_PROCESSOR/SC_VDJ_CONTIG_ASSEMBLER/ASSEMBLE_VDJ/fork0/chnk00-uf0fee24a37/files/contig_bam.bam @PG ID:samtools.1 PN:samtools PP:samtools VN:1.16.1 CL:samtools merge -@ 3 -c -p -s 0 -b /home/user/referenceData/run_vdj_S5/SC_VDJ_ASSEMBLER_CS/SC_MULTI_CORE/MULTI_GEM_WELL_PROCESSOR/VDJ_B_GEM_WELL_PROCESSOR/SC_VDJ_CONTIG_ASSEMBLER/ASSEMBLE_VDJ/fork0/join-uf0fee24a37/files/contig_bam.fofn /home/user/referenceData/run_vdj_S5/SC_VDJ_ASSEMBLER_CS/SC_MULTI_CORE/MULTI_GEM_WELL_PROCESSOR/VDJ_B_GEM_WELL_PROCESSOR/SC_VDJ_CONTIG_ASSEMBLER/ASSEMBLE_VDJ/fork0/join-uf0fee24a37/files/contig_bam.0.bam @PG ID:samtools.2 PN:samtools PP:samtools.1 VN:1.13 CL:samtools view -H all_contig.bam
I think the BAM file is from the alignment of the read to each BCR contig. The bam file for TRUST4 should be the alignment to the reference genome. Just curious, since your data already has cellranger vdj results, why you need to run TRUST4 on the data? Thank you.
I am trying to get some SHM analysis by TRUST4
The cellranger vdj probably already contains enough information for SHM analysis in the AIRR file. If you need to use TRUST4 from the beginning, I think using the VDJ fastq file is more convenient.
Here is my code and the bug: $ ./run-trust4 -b all_contig.bam -f all_contig.fasta -o TRUST_all_contig_toassemble --ref mm39.fa --barcode CB -t 4 [Tue Mar 12 11:32:23 2024] TRUST4 begins. [Tue Mar 12 11:32:23 2024] SYSTEM CALL: /home/user/trust4/TRUST4/bam-extractor -b all_contig.bam -t 4 -f all_contig.fasta -o TRUST_all_contig_toassemble_toassemble --barcode CB [Tue Mar 12 11:32:23 2024] Start to extract candidate reads from bam file. Unknown genome name: GGGGTAATTGAAGTCAAGACTCAGCCTGGACATGATGTCCTCTGCTCAGTTCCTTGGTCTCCTGTTGCTCTGTTTTCAAGGTACCAGATGTGATATCCAGATGACACAGACTACATCCTCCCTGTCTGCCTCTCTGGGAGACAGAGTCACCATCAGTTGCAGGGCAAGTCAGGACATTAGCAATTATTTAAACTGGTATCAGCAGAAACCAGATGGAACTGTTAAACTCCTGATCTACTACACATCAAGATTACACTCAGGAGTCCCATCAAGGTTCAGTGGCAGTGGGTCTGGAACAGATTATTCTCTCACCATTAGCAACCTGGAGCAAGAAGATATTGCCACTTACTTTTGCCAACAGGGTAATACGCTTCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC system /home/user/trust4/TRUST4/bam-extractor -b all_contig.bam -t 4 -f all_contig.fasta -o TRUST_all_contig_toassemble_toassemble --barcode CB failed: 256 at ./run-trust4 line 55.
I checked my data and the data should not have problems