Open bbimber opened 2 years ago
I am guessing lines like this break it:
MT RefSeq exon 536 607 . + . gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Phe"; exon_number "1";
note the empty string gene_id. This is just a guess at the moment. However, this GTF is verbatim from NCBI, so it might be nice if velocyto could work with (perhaps just ingore) these sort of lines.
Hi,
I also stuck in this place.
error information as:
Traceback (most recent call last):
File "/ism/zhangyanan/.conda/envs/velocyto/bin/velocyto", line 8, in
And I checked the log, it stoped when velocyto begin to parse chromosomes: 2022-04-02 20:50:10,198 - INFO - No SAMPLEID specified, the sample will be called sort_V2R2109240074_anno_decon_change_CB_LOJQD (last 5 digits are a random-id to avoid overwriting some other file by mistake) 2022-04-02 20:50:10,199 - DEBUG - Using logic: Default 2022-04-02 20:50:10,250 - INFO - Read 15986 cell barcodes from /gluster/home/chendongsheng/zhangyanan/velocyto/test_09240074/09240074_barcodes.tsv 2022-04-02 20:50:10,250 - DEBUG - Example of barcode: CELL12028_N1 and cell_id: sort_V2R2109240074_anno_decon_change_CB_LOJQD:CELL12028_N1 2022-04-02 20:50:10,412 - DEBUG - Peeking into /gluster/home/chendongsheng/zhangyanan/velocyto/test_09240074/sort_V2R2109240074_anno_decon_change_CB.bam 2022-04-02 20:50:10,585 - INFO - Starting the sorting process of /gluster/home/chendongsheng/zhangyanan/velocyto/test_09240074/sort_V2R2109240074_anno_decon_change_CB.bam the output will be at: /gluster/home/chendongsheng/zhangyanan/velocyto/test_09240074/cellsorted_sort_V2R2109240074_anno_decon_change_CB.bam 2022-04-02 20:50:10,586 - INFO - Command being run is: samtools sort -l 7 -m 2048M -t CB -O BAM -@ 24 -o /gluster/home/chendongsheng/zhangyanan/velocyto/test_09240074/cellsorted_sort_V2R2109240074_anno_decon_change_CB.bam /gluster/home/chendongsheng/zhangyanan/velocyto/test_09240074/sort_V2R2109240074_anno_decon_change_CB.bam 2022-04-02 20:50:10,586 - INFO - While the bam sorting happens do other things... 2022-04-02 20:50:10,586 - INFO - Load the annotation from /gluster/home/chendongsheng/zhangyanan/velocyto/GCF_000002285.5_Dog10K_Boxer_Tasha_genomic.gtf 2022-04-02 20:50:20,142 - DEBUG - Parsing Chromosome NC_002008.4 strand - [line 0]
####### And I also checked the gif file, all gene_id are not empty. Do you have any idea about what's going on? Thanks a lot!
For more information, my gtf looks like this:
NC_006583.4 Gnomon gene 29337 30631 . + . gene_id "LOC100684726"; transcript_id ""; db_xref "GeneID:100684726"; gbkey "Gene"; gene "LOC100684726"; gene_biotype "pseudogene"; pseudo "true"; NC_006583.4 Gnomon gene 651385 789323 . + . gene_id "LOC119870367"; transcript_id ""; db_xref "GeneID:119870367"; gbkey "Gene"; gene "LOC119870367"; gene_biotype "lncRNA"; NC_006583.4 Gnomon transcript 651385 789323 . + . gene_id "LOC119870367"; transcript_id "XR_005316961.1"; db_xref "GeneID:119870367"; gbkey "ncRNA"; gene "LOC119870367"; model_evidence "Supporting evidence includes similarity to: 1 EST, and 99% coverage of the annotated genomic feature by RNAseq alignments, including 10 samples with support for all annotated introns"; product "uncharacterized LOC119870367"; transcript_biotype "lnc_RNA"; NC_006583.4 Gnomon exon 651385 652415 . + . gene_id "LOC119870367"; transcript_id "XR_005316961.1"; db_xref "GeneID:119870367"; gene "LOC119870367"; model_evidence "Supporting evidence includes similarity to: 1 EST, and 99% coverage of the annotated genomic feature by RNAseq alignments, including 10 samples with support for all annotated introns"; product "uncharacterized LOC119870367"; transcript_biotype "lnc_RNA"; exon_number "1"; NC_006583.4 Gnomon exon 695111 695247 . + . gene_id "LOC119870367"; transcript_id "XR_005316961.1"; db_xref "GeneID:119870367"; gene "LOC119870367"; model_evidence "Supporting evidence includes similarity to: 1 EST, and 99% coverage of the annotated genomic feature by RNAseq alignments, including 10 samples with support for all annotated introns"; product "uncharacterized LOC119870367"; transcript_biotype "lnc_RNA"; exon_number "2"; NC_006583.4 Gnomon exon 696316 696387 . + . gene_id "LOC119870367"; transcript_id "XR_005316961.1"; db_xref "GeneID:119870367"; gene "LOC119870367"; model_evidence "Supporting evidence includes similarity to: 1 EST, and 99% coverage of the annotated genomic feature by RNAseq alignments, including 10 samples with support for all annotated introns"; product "uncharacterized LOC119870367"; transcript_biotype "lnc_RNA"; exon_number "3";
Hi, I have solved this problem. Maybe you want to check the chromosome names in your bam and gtf. I looked through the code of velocyto and found that velocyto can only recognize chromosome name like 1... or chr1... So I changed my chromosome name in bam and gtf from NC_006583.4... to chr1... and it works. Hope this can help you.
Hi, I have solved this problem. Maybe you want to check the chromosome names in your bam and gtf. I looked through the code of velocyto and found that velocyto can only recognize chromosome name like 1... or chr1... So I changed my chromosome name in bam and gtf from NC_006583.4... to chr1... and it works. Hope this can help you.
Maybe that's right! But for me,the first column of gtf file downloaded from NNCBI has too many value,I don't know how to map them to stand format like chr1,chr2...。
There is the first colmn of my gtf file。 `(base) [21031211625@login01_d_02 resource]$ cut -f1 GCF_000001635.26_GRCm38.p6_genomic.gtf | sort | uniq
NC_000067.6 NC_000068.7 NC_000069.6 NC_000070.6 NC_000071.6 NC_000072.6 NC_000073.6 NC_000074.6 NC_000075.6 NC_000076.6 NC_000077.6 NC_000078.6 NC_000079.6 NC_000080.6 NC_000081.6 NC_000082.6 NC_000083.6 NC_000084.6 NC_000085.6 NC_000086.7 NC_000087.7 NC_005089.1 NT_039192.2 NT_039194.5 NT_039195.1 NT_039198.1 NT_039223.3 NT_039247.2 NT_039248.1 NT_039252.1 NT_039256.1 NT_039257.4 NT_039291.1 NT_039296.1 NT_039334.1 NT_039367.1 NT_039372.2 NT_039373.1 NT_039382.4 NT_039443.1 NT_039469.1 NT_039470.1 NT_039525.1 NT_039593.1 NT_039594.3 NT_039614.1 NT_039622.1 NT_039630.5 NT_039634.4 NT_039661.4 NT_039662.3 NT_039666.2 NT_039697.1 NT_039739.2 NT_039744.2 NT_039747.4 NT_078572.1 NT_078683.1 NT_080256.1 NT_080257.1 NT_095534.1 NT_096355.1 NT_096797.1 NT_097336.1 NT_099024.1 NT_109318.2 NT_109599.1 NT_111920.1 NT_114257.4 NT_114985.3 NT_114988.1 NT_162750.1 NT_165789.2 NT_166280.1 NT_166281.1 NT_166282.1 NT_166283.1 NT_166287.2 NT_166291.1 NT_166293.1 NT_166294.1 NT_166295.1 NT_166299.2 NT_166305.2 NT_166307.1 NT_166317.2 NT_166322.1 NT_166394.1 NT_166396.1 NT_166397.1 NT_166399.1 NT_166400.1 NT_166433.1 NT_166434.1 NT_166438.1 NT_166443.1 NT_166450.1 NT_166456.1 NT_166462.1 NT_166463.1 NT_166466.1 NT_166469.1 NT_186999.1 NT_187000.1 NT_187001.1 NT_187002.1 NT_187003.1 NT_187004.1 NT_187005.1 NT_187006.1 NT_187007.1 NT_187008.1 NT_187009.1 NT_187010.1 NT_187011.1 NT_187012.1 NT_187013.1 NT_187014.1 NT_187015.1 NT_187016.1 NT_187017.1 NT_187018.1 NT_187019.1 NT_187020.1 NT_187021.1 NT_187022.1 NT_187023.1 NT_187024.1 NT_187025.1 NT_187026.1 NT_187027.1 NT_187028.1 NT_187029.1 NT_187030.1 NT_187052.1 NT_187053.1 NT_187054.1 NT_187055.1 NT_187056.1 NT_187057.1 NT_187058.1 NT_187059.1 NT_187060.1 NT_187064.1 NW_004058050.1 NW_004058051.1 NW_004058052.1 NW_004058053.1 NW_004058054.1 NW_004058055.2 NW_004058056.1 NW_004058058.1 NW_004450259.3 NW_004450260.1 NW_004450261.1 NW_004450262.2 NW_004450263.1 NW_006763128.1 NW_006763129.1 NW_006763130.1 NW_012132900.1 NW_012132901.1 NW_012132902.1 NW_012132903.1 NW_012132904.1 NW_012132905.1 NW_012132906.1 NW_012132907.1 NW_012132908.1 NW_012132909.2 NW_012132910.1 NW_012132911.1 NW_012132912.1 NW_012132913.1 NW_016097312.1 NW_016097313.1 NW_016097314.1 NW_016097315.1 NW_016097316.1 NW_016097317.1 NW_016097318.1 NW_016097319.1 NW_016097320.1 NW_016097321.1 NW_016097322.1 NW_019168503.1 NW_019168504.1 NW_019168505.1 NW_019168506.1 NW_019168507.1 NW_019168508.1 NW_019168509.1 NW_019168510.1 NW_019168511.1 NW_019168512.1 NW_019168513.1 NW_019168514.1 NW_019168515.1 NW_019168516.1 NW_019168517.1 NW_019168518.1 NW_019168519.1 NW_019168520.1 NW_019168521.1 NW_019168522.1 NW_019168523.1 NW_019168524.1 NW_019168525.1 NW_019168526.1 NW_019168527.1 NW_019168528.1 NW_019168529.1 NW_019168530.1 NW_019168531.1 NW_019168532.1 NW_019168533.1 NW_019168534.1`
Same here.
In my case, I performed 10X CellRanger
against scRNA-seq data of Hemicentrotus pulcherrimus, which is a kind of sea urchin, and then tried to apply velocyto
.
I used two GTF
files convered by gffread
from the GFF3
files of Hpbase
(https://cell-innovation.nig.ac.jp/cgi-bin/Hpul_public/Hpul_annot_download.cgi) and Echinobase
(https://www.echinobase.org/entry/static-echinobase/ftpDatafiles.jsp).
The error message was about the line below https://github.com/velocyto-team/velocyto.py/blob/0963dd2df0ac802c36404e0f434ba97f07edfe4b/velocyto/counter.py#L514
and I finally found out that after deleting the lines where gene_id are not written, velocyto
worked.
The error message was like below:
(velocyto_env) velocyto run10x $SAMPLE $GTF
2022-10-05 14:44:49,023 - DEBUG - Using logic: Default
2022-10-05 14:44:49,037 - INFO - Read 5633 cell barcodes from /data2/koki/urchin-workflow2/output/echinobase/SeaUrchin-scRNA-01/outs/filtered_feature_bc_matrix/barcodes.tsv.gz
2022-10-05 14:44:49,037 - DEBUG - Example of barcode: AAACCCACACAAATGA and cell_id: SeaUrchin-scRNA-01:AAACCCACACAAATGA-1
2022-10-05 14:44:49,047 - DEBUG - Peeking into /data2/koki/urchin-workflow2/output/echinobase/SeaUrchin-scRNA-01/outs/possorted_genome_bam.bam
2022-10-05 14:44:49,055 - WARNING - Not found cell and umi barcode in entry 34 of the bam file
2022-10-05 14:44:49,055 - WARNING - Not found cell and umi barcode in entry 35 of the bam file
2022-10-05 14:44:49,055 - WARNING - Not found cell and umi barcode in entry 121 of the bam file
2022-10-05 14:44:49,056 - WARNING - Not found cell and umi barcode in entry 159 of the bam file
2022-10-05 14:44:49,056 - WARNING - Not found cell and umi barcode in entry 174 of the bam file
2022-10-05 14:44:49,056 - WARNING - Not found cell and umi barcode in entry 189 of the bam file
2022-10-05 14:44:49,056 - WARNING - Not found cell and umi barcode in entry 214 of the bam file
2022-10-05 14:44:49,056 - WARNING - Not found cell and umi barcode in entry 234 of the bam file
2022-10-05 14:44:49,056 - WARNING - Not found cell and umi barcode in entry 238 of the bam file
2022-10-05 14:44:49,056 - WARNING - Not found cell and umi barcode in entry 258 of the bam file
2022-10-05 14:44:49,057 - WARNING - Not found cell and umi barcode in entry 282 of the bam file
2022-10-05 14:44:49,057 - WARNING - Not found cell and umi barcode in entry 307 of the bam file
2022-10-05 14:44:49,057 - WARNING - Not found cell and umi barcode in entry 347 of the bam file
2022-10-05 14:44:49,057 - WARNING - Not found cell and umi barcode in entry 368 of the bam file
2022-10-05 14:44:49,057 - WARNING - Not found cell and umi barcode in entry 401 of the bam file
2022-10-05 14:44:49,058 - WARNING - Not found cell and umi barcode in entry 403 of the bam file
2022-10-05 14:44:49,058 - WARNING - Not found cell and umi barcode in entry 404 of the bam file
2022-10-05 14:44:49,058 - WARNING - Not found cell and umi barcode in entry 406 of the bam file
2022-10-05 14:44:49,058 - WARNING - Not found cell and umi barcode in entry 407 of the bam file
2022-10-05 14:44:49,058 - WARNING - Not found cell and umi barcode in entry 408 of the bam file
2022-10-05 14:44:49,058 - WARNING - Not found cell and umi barcode in entry 410 of the bam file
2022-10-05 14:44:49,058 - WARNING - Not found cell and umi barcode in entry 411 of the bam file
2022-10-05 14:44:49,058 - WARNING - Not found cell and umi barcode in entry 414 of the bam file
2022-10-05 14:44:49,058 - WARNING - Not found cell and umi barcode in entry 415 of the bam file
2022-10-05 14:44:49,059 - WARNING - Not found cell and umi barcode in entry 463 of the bam file
2022-10-05 14:44:49,059 - WARNING - Not found cell and umi barcode in entry 499 of the bam file
2022-10-05 14:44:49,059 - WARNING - Not found cell and umi barcode in entry 562 of the bam file
2022-10-05 14:44:49,059 - WARNING - Not found cell and umi barcode in entry 605 of the bam file
2022-10-05 14:44:49,060 - WARNING - Not found cell and umi barcode in entry 639 of the bam file
2022-10-05 14:44:49,060 - WARNING - Not found cell and umi barcode in entry 646 of the bam file
2022-10-05 14:44:49,063 - WARNING - The file /data2/koki/urchin-workflow2/output/echinobase/SeaUrchin-scRNA-01/outs/cellsorted_possorted_genome_bam.bam already exists. The sorting step will be skipped and the existing file will be used.
2022-10-05 14:44:49,063 - INFO - Load the annotation from /data2/koki/urchin-workflow2/data/echinobase/sp5_0_GCF.gtf
2022-10-05 14:44:49,399 - WARNING - The entry exon_number was not present in the gtf file. It will be infferred from the position.
2022-10-05 14:44:51,801 - DEBUG - Parsing Chromosome NC_001453.1 strand - [line 0]
Traceback (most recent call last):
File "/home/koki/miniconda3/envs/velocyto_env/bin/velocyto", line 8, in <module>
sys.exit(cli())
File "/home/koki/miniconda3/envs/velocyto_env/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/koki/miniconda3/envs/velocyto_env/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/koki/miniconda3/envs/velocyto_env/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/koki/miniconda3/envs/velocyto_env/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/koki/miniconda3/envs/velocyto_env/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/koki/miniconda3/envs/velocyto_env/lib/python3.8/site-packages/velocyto/commands/run10x.py", line 112, in run10x
return _run(bamfile=(bamfile, ), gtffile=gtffile, bcfile=bcfile, outputfolder=outputfolder,
File "/home/koki/miniconda3/envs/velocyto_env/lib/python3.8/site-packages/velocyto/commands/_run.py", line 186, in _run
annotations_by_chrm_strand = exincounter.read_transcriptmodels(gtffile)
File "/home/koki/miniconda3/envs/velocyto_env/lib/python3.8/site-packages/velocyto/counter.py", line 514, in read_transcriptmodels
geneid = regex_geneid.search(tags).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
Hello,
I'm trying to run velocyto using 10x data and GTFs for macaca mulatta from NCBI. The command is:
velocyto run -o ./outs -b barcodes.tsv.gz --samtools-threads 11 -m mm10_rmsk.rename.gtf 438-10-GEX.bam NCBI.Mmul_10.103.translated.gtf
It gives the following error, which I assume is something related to a line in the GTF from MT? Have you encountered problems like this before?