velocyto-team / velocyto.py

RNA velocity estimation in Python
http://velocyto.org/velocyto.py/
BSD 2-Clause "Simplified" License
159 stars 84 forks source link

WARNING - The .bam file refers to a chromosome not present in the annotation (.gtf) file #388

Open sekekeretsu opened 7 months ago

sekekeretsu commented 7 months ago

Dear Velocyto-experts,

I am trying to perform RNA velocity analysis of 10x single cell data (cite seq) and during my velocyto executions I noticed warnings that the chromosome IDs in my bam files are not found in the gtf. I know that the cellranger used the mm10 2020 gtf file provided on cell ranger and I have made sure to use the same gtf files for the velocyto however, I found this warning are reported for all my bam files. Could somebody give some insight regarding this and a potential solution to it if possible? I am running the script on high performance computing with multiple cores (32) and 64 GB using slurm.

The command I use in my script are: module load velocyto/0.17 velocyto run10x -m mm10_rmsk.gtf sample1 genes/genes.gtf Here, the mm10_rmsk.gtf is the mask file and the genes.gtf is the annotation gtf file.

Could this be due to the fact that this are cite-seq data? And help from you would be appreciated.

Here is a log of my velocyto execution:

2024-02-28 01:21:25,543 - WARNING - The .bam file refers to a chromosome '8+' not present in the annotation (.gtf) file 2024-02-28 01:21:25,543 - WARNING - The .bam file refers to a chromosome '8-' not present in the annotation (.gtf) file 2024-02-28 01:22:10,685 - DEBUG - Read first 240 million reads 2024-02-28 01:22:30,758 - DEBUG - Marking up chromosome 9 2024-02-28 01:22:30,758 - WARNING - The .bam file refers to a chromosome '9+' not present in the annotation (.gtf) file 2024-02-28 01:22:30,758 - WARNING - The .bam file refers to a chromosome '9-' not present in the annotation (.gtf) file 2024-02-28 01:22:55,539 - DEBUG - Read first 250 million reads 2024-02-28 01:23:42,796 - DEBUG - Read first 260 million reads 2024-02-28 01:23:49,789 - DEBUG - Marking up chromosome MT 2024-02-28 01:23:49,789 - WARNING - The .bam file refers to a chromosome 'MT+' not present in the annotation (.gtf) file 2024-02-28 01:23:49,789 - WARNING - The .bam file refers to a chromosome 'MT-' not present in the annotation (.gtf) file 2024-02-28 01:24:11,689 - DEBUG - Marking up chromosome X 2024-02-28 01:24:11,689 - WARNING - The .bam file refers to a chromosome 'X+' not present in the annotation (.gtf) file 2024-02-28 01:24:11,689 - WARNING - The .bam file refers to a chromosome 'X-' not present in the annotation (.gtf) file 2024-02-28 01:24:16,968 - DEBUG - Read first 270 million reads 2024-02-28 01:24:50,238 - DEBUG - Marking up chromosome Y 2024-02-28 01:24:50,238 - WARNING - The .bam file refers to a chromosome 'Y+' not present in the annotation (.gtf) file 2024-02-28 01:24:50,238 - WARNING - The .bam file refers to a chromosome 'Y-' not present in the annotation (.gtf) file 2024-02-28 01:24:53,532 - DEBUG - Marking up chromosome JH584299.1 2024-02-28 01:24:53,533 - WARNING - The .bam file refers to a chromosome 'JH584299.1+' not present in the annotation (.gtf) file 2024-02-28 01:24:53,533 - WARNING - The .bam file refers to a chromosome 'JH584299.1-' not present in the annotation (.gtf) file 2024-02-28 01:24:53,533 - DEBUG - Marking up chromosome GL456233.1 2024-02-28 01:24:53,533 - WARNING - The .bam file refers to a chromosome 'GL456233.1+' not present in the annotation (.gtf) file 2024-02-28 01:24:53,533 - WARNING - The .bam file refers to a chromosome 'GL456233.1-' not present in the annotation (.gtf) file 2024-02-28 01:24:53,569 - DEBUG - Marking up chromosome JH584301.1 2024-02-28 01:24:53,569 - WARNING - The .bam file refers to a chromosome 'JH584301.1+' not present in the annotation (.gtf) file 2024-02-28 01:24:53,569 - WARNING - The .bam file refers to a chromosome 'JH584301.1-' not present in the annotation (.gtf) file 2024-02-28 01:24:53,569 - DEBUG - Marking up chromosome GL456211.1 2024-02-28 01:24:53,569 - WARNING - The .bam file refers to a chromosome 'GL456211.1+' not present in the annotation (.gtf) file 2024-02-28 01:24:53,569 - WARNING - The .bam file refers to a chromosome 'GL456211.1-' not present in the annotation (.gtf) file 2024-02-28 01:24:53,572 - DEBUG - Marking up chromosome GL456221.1 2024-02-28 01:24:53,572 - WARNING - The .bam file refers to a chromosome 'GL456221.1+' not present in the annotation (.gtf) file 2024-02-28 01:24:53,572 - WARNING - The .bam file refers to a chromosome 'GL456221.1-' not present in the annotation (.gtf) file 2024-02-28 01:24:53,577 - DEBUG - Marking up chromosome JH584297.1 2024-02-28 01:24:53,577 - WARNING - The .bam file refers to a chromosome 'JH584297.1+' not present in the annotation (.gtf) file 2024-02-28 01:24:53,577 - WARNING - The .bam file refers to a chromosome 'JH584297.1-' not present in the annotation (.gtf) file 2024-02-28 01:24:53,578 - DEBUG - Marking up chromosome JH584294.1

Thanks, Seke

sekekeretsu commented 7 months ago

I could not include the entire log file but I want to include other warning I found that could be relevant. 2024-02-28 00:38:30,886 - DEBUG - Using logic: Default 2024-02-28 00:38:30,891 - INFO - Read 4396 cell barcodes from /vf/users/keretsus2/projects/iNKT_scRNAseq_TerabeLab/02_PrimaryAnalysisOutput/00_FullCellrangerOutputs/GEX/gexoutput/batch1/SCAF 3244_Axillary_LNs/outs/filtered_feature_bc_matrix/barcodes.tsv.gz 2024-02-28 00:38:30,891 - DEBUG - Example of barcode: AAACCTGAGCCACGTC and cell_id: SCAF3244_Axillary_LNs:AAACCTGAGCCACGTC-1 2024-02-28 00:38:30,911 - DEBUG - Peeking into /vf/users/keretsus2/projects/iNKT_scRNAseq_TerabeLab/02_PrimaryAnalysisOutput/00_FullCellrangerOutputs/GEX/gexoutput/batch1/SCAF3244_Axillary_L Ns/outs/possorted_genome_bam.bam 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 33 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 61 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 65 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 70 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 128 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 130 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 136 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 137 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 142 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 150 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 156 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 174 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 249 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 251 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 252 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 253 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 254 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 255 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 261 of the bam file 2024-02-28 00:38:30,939 - WARNING - Not found cell and umi barcode in entry 262 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 543 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 553 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 554 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 558 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 559 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 560 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 562 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 568 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 582 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 589 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 590 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 591 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 592 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 593 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 594 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 596 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 600 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 604 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 605 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 616 of the bam file 2024-02-28 00:38:30,940 - WARNING - Not found cell and umi barcode in entry 617 of the bam file 2024-02-28 00:38:30,941 - WARNING - Not found cell and umi barcode in entry 689 of the bam file 2024-02-28 00:38:30,941 - WARNING - Not found cell and umi barcode in entry 696 of the bam file

xiaodeng6410 commented 7 months ago

This problem could relate to your annotation file generated by cellranger. I remake my referencefile by download GFT file in ensemble today and then found the problem ware solved. This warning will lead to empty martix in loom file.

qingmemng commented 4 months ago

@xiaodeng6410 Hello!

I encountered the same issue and I'm looking for guidance on how to remake the reference file by downloading the GTF file from Ensembl. Could you please share a workflow or any detailed steps you followed to resolve this problem?

Thank you!

xiaodeng6410 commented 4 months ago

Build a Custom Reference for Cell Ranger (mkref) - Official 10x Genomics Supporthttps://www.10xgenomics.com/support/software/cell-ranger/latest/tutorials/cr-tutorial-mr Please follow the standard procedure of Cell Ranger to prepare the reference genome, and download the genome and annotation files from Ensemble.

来自 Outlookhttp://aka.ms/weboutlook


发件人: qingmemng @.> 发送时间: 2024年5月28日 14:19 收件人: velocyto-team/velocyto.py @.> 抄送: xiaodeng6410 @.>; Mention @.> 主题: Re: [velocyto-team/velocyto.py] WARNING - The .bam file refers to a chromosome not present in the annotation (.gtf) file (Issue #388)

@xiaodeng6410https://github.com/xiaodeng6410 Hello!

I encountered the same issue and I'm looking for guidance on how to remake the reference file by downloading the GTF file from Ensembl. Could you please share a workflow or any detailed steps you followed to resolve this problem?

Thank you!

― Reply to this email directly, view it on GitHubhttps://github.com/velocyto-team/velocyto.py/issues/388#issuecomment-2134426522, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APHVWRIH7KRVEIQH7GHT5UDZEQOW7AVCNFSM6AAAAABD5ONX6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZUGQZDMNJSGI. You are receiving this because you were mentioned.Message ID: @.***>