liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
268 stars 46 forks source link

Issue with Running BuildDatabaseFa.pl Script for TCR and BCR Analysis in Vicugna pacos (Alpaca) #240

Open bigcat1001 opened 9 months ago

bigcat1001 commented 9 months ago

I am attempting to use the BuildDatabaseFa.pl script from Trust4 to analyze T-cell and B-cell receptor sequences in Vicugna pacos (alpaca). I have encountered an error during the script execution, and I am seeking assistance in resolving it. 1.I ran BuildImgtAnnot.pl and get "bcr_tcr_gene_name.txt",it looks like: IGHA IGHD1 ... IGHV4S7 IGHV4S8 IGHV4S9

  1. I download reference.fa and grf from ensembl: https://useast.ensembl.org/info/data/ftp/index.html
  2. I ran BuildImgtAnnot.pl,however,it reported No transcript_nameGeneScaffold_89 ensembl exon 536723 536788 . - . gene_id "ENSVPAG00000000584"; gene_version "1"; transcript_id "ENSVPAT00000000584"; transcript_version "1"; exon_number "1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSVPAE00000006748"; exon_version "1"; tag "Ensembl_canonical"; I suppose the format or content of the GTF file might be incompatible with the script's requirements?Besides, I am facing an issue that gene names in bcr_tcr_gene_name.txt are not found in the GTF file I am using (Vicugna_pacos.vicPac1.110.gtf). Does this mean I have to manually create the bcrtcr.fa file?
mourisl commented 9 months ago

If your input data to TRUST4 is fastq file, you can directly use the fasta file created by the BuildImgtAnnot.pl script as the input for both "-f" and "--ref" option. The "BuildDatabaseFasta.pl" is mainly to create the file that is required for BAM input. I will clarify this in README later.

bigcat1001 commented 9 months ago

Thanks,I will try