Closed likelet closed 4 years ago
Main error is:
sambamba view sample.sort.bam > sample.sam # resolved error caused by bam and htseq version conflicts
htseq-count -t exon -i gene_id -r pos -f sam sample.sam final_all.gtf > sample.htseq.count
rm sample.sam
I dont see any .command.out
or .command.err
files, also checked the sample.htseq.count
file and it is not empty. I see the first column with gene names and second column with raw counts.
A1BG 112
A1CF 17
A2M 70
A2ML1 6
A2ML1-AS-1 286
A3GALT2 6
A4GALT 2
A4GNT 3
AAAS 0
AACS 2
Yes, this means that the sample.htseq.count
has been generated.
plz try rerun the analysis by adding a -resume
and to see wether the error still exists.
yes , this means that the
sample.htseq.count
has been generated.
Yes you are right. But the JOB was stopped due to the error.
ERROR ~ Error executing process > 'Run_htseq_for_quantification (sample)'
Caused by:
Missing output file(s) `sample.htseq.count ` expected by process `Run_htseq_for_quantification (sample)`
this is weird, it shouldn't be an error here. try using the latest Nextflow?
this is weird, it shouldn't be an error here. try using the latest Nextflow?
I resumed the JOB and its going on. will check this and then will use latest one.
I got the same error again. Do you think the latest Nextflow will be fine without this error? And is it this https://github.com/nf-core/lncpipe/tree/dev the latest one?
I got the same error again. Do you think the latest Nextflow will be fine without this error? And is it this https://github.com/nf-core/lncpipe/tree/dev the latest one?
I mean the latest nextflow. not the lncpipe. and the "https://github.com/nf-core/lncpipe/tree/dev" is not the latest version of lncPipe but just an adjusted version for the nf-core community, which provide better code structure for conventional pipelines. The lncPipe in this repo is still the latest one that has been tested locally.
BTW, I am not sure whether it would be fixed by changing the NextFlow binary file. But the code should be fine in this case. If the problem still exists, you can pull the count file out and try combining the result manually by using the scripts in the bin
folder.
a proper code for achieving this is as follow
perl !{baseDir}/bin/get_map_table.pl final_all.gtf > map.file
R CMD BATCH !{baseDir}/bin/get_htseq_matrix.R
and the !{baseDir}
is the home path of lncPipe
When this step was done you can run the lncPipereporter by code
library(LncPipeReporter)
run_reporter(input = "./",
output = 'reporter.html',
theme = 'npg',
cdf.percent = 10,
max.lncrna.len = 10000,
min.expressed.sample = 50,
ask = FALSE)
it will automately dectect files in the current folder to generate the report.
I saw your reply in Github. So, final_all gtf also have novel lncRNAs? Am I Right? perl lncPipe/bin/get_map_table.pl final_all.gtf > map.file final_all.gtf is it the one in the folder where STB79.sort.bam is there? Sorry I am a bit confused. And interested to know why I see that Error? Where Can I find latest nextflow binary file?
yes final_all.gtf
also contains novel lncRNAs , it also could be found from result folder.
i have no idea right now that why the errors come out, and the latest nexflow binary could be found from
Ok. Anyways I see that *.sort.bam
files are in Result/hisat_alignment
. So, with featurecounts
tool using final_all.gtf
and *.sort.bam
files I extracted the raw counts. Please check the attachment for one of the sample sending the counts for few genes
sample_counts.txt
[last column in the file].
In the first column I can see all the protein coding genes, I can see some novel lncRNAs which are named based on the closet protein coding gene.
I also see the known lncRNAs are also given the names based on closet protein coding gene? How do I get their original names? As I said before it will be very helpful if you could provide a table with Name-old name for users.
yes, we have updated the code enabling generated a mapfile
as you suggested, which could be found in the result folder where the final.gtf
be placed.
Sorry I dont have any map.file
in Result folder. I see only hisat_alignment, Identified_lncRNA, Merged_assemblies
folders in the Result folder.
As I dont see any map.file
, I did like below:
perl bin/get_map_table.pl --gtf_file=Result/Identified_lncRNA/final_all.gtf > map.file
And in the map.file
I see protein coding genes and its transcript ids. I also see known and novel lncRNAs. I dont see any Name-oldname for known lncRNAs in the map.file
. Please check the attachment. Sending you the file
mapfile.txt
the code should be
perl !{baseDir}/bin/get_map_table.pl Result/Identified_lncRNA/final_all.gtf > map.file
and i can't download the mapfile.txt due to a bad network, can you plz paste the top 100 lines in the mapfile here.
In the folder lncpipe
I have all the directories like bin, Combined_annotations, docs, lncpipe.image, nextflow, nextflow.config
etc.....
I'm inside lncpipe
folder and gave the following command to get map.file
perl bin/get_map_table.pl Result/Identified_lncRNA/final_all.gtf > map.file
When I opened map.file
I see the following:
Required parameters missing
Usage: get_map_table.pl --gtf_file=final_all.gtf
So, I gave the command again with --gtf_file
perl bin/get_map_table.pl --gtf_file=Result/Identified_lncRNA/final_all.gtf > map.file
Now I can see three columns in map.file
. I am sending an other file which has top 100 lines of map.file
.
top100_map_file.txt
And a small suggestion. May be you can also use tools like Pfam, RNAFold, Infernal/cmscan which filter out RNAs based on sequence and secondary structure and gives novel lncRNAs.
Thanks for your suggestion, we are working on the updated version of lncPipe by adding the features from to do list. Your suggestion has already been added into to do list for further implementation
Thanks for your suggestion, we are working on the updated version of lncPipe by adding the features from to do list. Your suggestion has already been added into to do list for further implementation
And may I know what about map.file
please
the first column is the name in final_all.gtf
, and the second column is the corresponding name in database like ensemble or lncpidia. the third column presents the type of gene: coding gene, known lncRNA or novel ones.
the first column is the name in
final_all.gtf
, and the second column is the corresponding name in database like ensemble or lncpidia. the third column presents the type of gene: coding gene, known lncRNA or novel ones.
Yes I know that. Sorry I have checked the lncpedia and also ensemble. I didn't find any known lncRNAs with those names.
For example:
LINC-EFNA5-21 LINC-EFNA5-21:1 known LINC-FP236240.1-2 LINC-FP236240.1-2:1 known LINC-SALL1-10 LINC-SALL1-10:1 known
I haven't seen any of these lncRNAs in the table of lncpedia or ensemble. Could you please check again once. All protein coding are fine. But why known lncRNAs looks like that.
May I know which version of lncpedia you are using?
Hello, LINC-EFNA5-21 LINC-EFNA5-21:1 known LINC-FP236240.1-2 LINC-FP236240.1-2:1 known LINC-SALL1-10 LINC-SALL1-10:1 known
These names are newly defined by LncPipe, we have generated a mapping file in Result/Identified_lncRNA/lncRNA.mapping.file. The first column is the new name after StringTie merge, and the second column is the newly defined ID by LncPipe. the third and forth columns are the corresponding names in database like ensemble or lncpidia.
MSTRG.1 LINC-PLCXD1-1 ENSG00000228572.7
MSTRG.10 LINC-PPP2R3B-5 lnc-PLCXD1-3
MSTRG.1002 LINC-ITIH6-1 ENSG00000215197.4
MSTRG.1005 LINC-MAGED2-1 lnc-ITIH6-1
MSTRG.1006 LINC-MAGED2-2 ENSG00000275387.1
MSTRG.1011 LINC-ALAS2-1 ENSG00000278283.1
MSTRG.1012 LINC-PAGE2B-1 ENSG00000234466.1 lnc-ALAS2-1
MSTRG.1015 LINC-PAGE2-1 ENSG00000278319.1
MSTRG.1016 LINC-FAM104B-1 ENSG00000276929.1
MSTRG.1018 LINC-MTRNR2L10-2 ENSG00000186678.7 lnc-PAGE5-2
MSTRG.1019 LINC-MTRNR2L10-1 ENSG00000229760.1 lnc-MTRNR2L10-3
If you use final_all.gtf to generage mapping file, you won't find the name in database like ensemble or lncpidia because we have a rename procedure.
Hello, LINC-EFNA5-21 LINC-EFNA5-21:1 known LINC-FP236240.1-2 LINC-FP236240.1-2:1 known LINC-SALL1-10 LINC-SALL1-10:1 known
These names are newly defined by LncPipe, we have generated a mapping file in Result/Identified_lncRNA/lncRNA.mapping.file. The first column is the new name after StringTie merge, and the second column is the newly defined ID by LncPipe. the third and forth columns are the corresponding names in database like ensemble or lncpidia.
MSTRG.1 LINC-PLCXD1-1 ENSG00000228572.7 MSTRG.10 LINC-PPP2R3B-5 lnc-PLCXD1-3 MSTRG.1002 LINC-ITIH6-1 ENSG00000215197.4 MSTRG.1005 LINC-MAGED2-1 lnc-ITIH6-1 MSTRG.1006 LINC-MAGED2-2 ENSG00000275387.1 MSTRG.1011 LINC-ALAS2-1 ENSG00000278283.1 MSTRG.1012 LINC-PAGE2B-1 ENSG00000234466.1 lnc-ALAS2-1 MSTRG.1015 LINC-PAGE2-1 ENSG00000278319.1 MSTRG.1016 LINC-FAM104B-1 ENSG00000276929.1 MSTRG.1018 LINC-MTRNR2L10-2 ENSG00000186678.7 lnc-PAGE5-2 MSTRG.1019 LINC-MTRNR2L10-1 ENSG00000229760.1 lnc-MTRNR2L10-3
If you use final_all.gtf to generage mapping file, you won't find the name in database like ensemble or lncpidia because we have a rename procedure.
Thanks a lot for the answer. But I have to say that in Result/Identified_lncRNA/
I dont see any lncRNA.mapping.file
. I see only the below files inside Identified_lncRNA
folder.
How can I get the lncRNA.mapping.file
now?
Have you ever updated LncPipe? We have added a function to generate this file last year in rename_lncRNA_2.pl.
You can re-run the commond at line 1162 of LncRNAanalysisPipe.nf. Of course, you should change DIR into the work DIR of this command. You can find the work DIR by this command: find ./ -name lncRNA.final.v2.gtf
Have you ever updated LncPipe? We have added a function to generate this file last year in rename_lncRNA_2.pl.
The Lncpipe I'm using is from last year September. which is the updated one? Is it this https://github.com/nf-core/lncpipe/tree/dev
We have updated it, and this one is the updated one. The former version is removed, you can simply download it by git.
We have updated it, and this one is the updated one. The former version is removed, you can simply download it by git.
Yes I see that. Will update that and rerun my analysis. I will keep you posted. Thanks a lot (y)
Hi Kodayu,
I tried with updated version. And I see there is an error with rename_lncRNA_2.pl
script.
ERROR ~ Error executing process > 'Summary_renaming_and_classification (1)'
Caused by:
Process `Summary_renaming_and_classification (1)` terminated with an error exit status (1)
Command exit status:
1
Command output:
(empty)
Command error:
Use of uninitialized value $out_gene in concatenation (.) or string at /path/to/LncPipe/bin/rename_lncRNA_2.pl line 274, <FH> line 3471258.
Along with that I also see an error with hisat2 alignment. I'm running the pipeline on 6 samples. Among them three worked and three samples didn't work for alignment.
[d1/eee460] process > fastq_hisat2_alignment_For_discovery [100%] 6 of 6, failed: 3
May I know what could be the problem here? thanq
Hi Kodayu and Zhaoqi,
May I know something about my previous comment please. I made few changes in the perl script after I got the error...but I didn't find anything useful.
fastq_hisat2_alignment_For_discovery [100%] 6 of 6, failed: 3
I do not know why there are errors in hisat2 alignment step. This should not happen if three of them have been completed in your case. Maybe you can check the resource limits for the analysis, I remember that you went through this step before with the current lncPipe version but not the dev version in nf-core. you can extract the rename_lncRNA_2.pl
script in dev, and run it in the previous result manually..
sorry for the inconvenience of the script.
ok As you said I replaced the rename_lncRNA_2.pl
script from dev. NO change in the error. I have the same error again.
ERROR ~ Error executing process > 'Summary_renaming_and_classification (1)'
Caused by:
Process `Summary_renaming_and_classification (1)` terminated with an error exit status (1)
Command exit status:
1
Command output:
(empty)
Command error:
Use of uninitialized value $out_gene in concatenation (.) or string at /path/to/LncPipe/bin/rename_lncRNA_2.pl line 274, <FH> line 3471258.
Does the analysis on test dataset run any error message? for now, I am no idea about your errors, I think the gtf file might be malformed in this case. Maybe a fast solution is that I can give you remote assistance by anyDesk?
@venkan i have fixed script, plz try again.
@likelet thanq. I will give a try and will let you know
@likelet I see an Error again from the script.
ERROR ~ Error executing process > 'Summary_renaming_and_classification (1)'
Caused by:
Process `Summary_renaming_and_classification (1)` terminated with an error exit status (1)
Command exit status:
1
Command output:
(empty)
Command error:
Use of uninitialized value $field[2] in string ne at /documents/lncpipe/new_LncPipe/bin/rename_lncRNA_2.pl line 48, <FH> line 2883588.
it seems that the input file is not separated by tab.
but which input file? genome ref? gencode annotation? LNCipedia gene annotation file? which one?
@kodayu which file ??
@likelet @kodayu Any help please. Is the error about some input file? If so which one?
@venkan I am sorry for not logging in my account for a long time. I think the problem is that the lncipedia_mod.gtf is abnormal. In brief, it's the second input file of the script rename_lncRNA_2.pl. I suggest that you provide the top 20 and bottom 20 lines of this file for me. And I will check out the reason.
@kodayu Here it is for top 20 lines of script rename_lncRNA_2.pl.
head -20 rename_lncRNA_2.pl
#!/usr/bin/perl -w
use strict;
#die ("usage: <Genecode gtf file> <lncipedia gtf file>") unless @ARGV > 2;
#print "#Query file ".$ARGV[0]." with file_number ".$ARGV[1]."\n";
my %know_lnc;
open FH,"known.lncRNA.bed" or die;
while(<FH>){
chomp;
my @field=split "\t";
if ($field[7] eq "exon"){
$know_lnc{$field[0].'\t'.$field[1].'\t'.$field[5]} = $field[3];
$know_lnc{$field[0].'\t'.$field[2].'\t'.$field[5]} = $field[3];
}
}
my %genecode;my %lncpedia;
if (@ARGV == 2){
open FH,"$ARGV[0]" or die;
And bottom 20 lines is here.
}
my %all_data;
foreach my $mstr(sort(keys %genecode)){
$all_data{$mstr} = 1;
}
foreach my $mstr(sort(keys %lncpedia)){
$all_data{$mstr} = 1;
}
open OUT3,">lncRNA.mapping.file" or die;
foreach my $mstr(sort(keys %all_data)){
if(defined($MSTRG2genename{$mstr}) && defined($lncpedia{$mstr}) && defined($genecode{$mstr})){
print OUT3 $mstr."\t".$MSTRG2genename{$mstr}."\t".$genecode{$mstr}."\t".$lncpedia{$mstr}."\n";
}elsif (defined($MSTRG2genename{$mstr}) && defined($genecode{$mstr})){
print OUT3 $mstr."\t".$MSTRG2genename{$mstr}."\t".$genecode{$mstr}."\t\n"
}elsif (defined($MSTRG2genename{$mstr}) && defined($lncpedia{$mstr})){
print OUT3 $mstr."\t".$MSTRG2genename{$mstr}."\t\t".$lncpedia{$mstr}."\n"
}else{
next;
}
}
@venkan Sorry to mislead you, I mean the lncipedia_mod.gtf, the input of rename_lncRNA_2.pl.
@venkan The code is perl !{baseDir}/bin/rename_lncRNA_2.pl gencode_annotation_gtf_mod.gtf lncipedia_mod.gtf , I need the top and bottom lines of the last file.
@kodayu top 20 lines of lncipedia_mod.gtf.
##gtf
track name='LNCipedia' description='LNCipedia 5.0 - www.lncipedia.org' color=73,157,74 db=hg38 url='http://lncipedia.org/db/transcript/$$'
chr1 lncipedia.org exon 83801516 83803251 . - . gene_id "LINC01725" ; transcript_id "LINC01725:19" ; gene_alias_1 "ENSG00000233008" ; gene_alias_2 "RP11-475O6.1" ; gene_alias_3 "ENSG00000233008.1" ; gene_alias_4 "OTTHUMG00000009930.1" ; gene_alias_5 "ENSG00000233008.5" ; gene_alias_6 "LINC01725" ; gene_alias_7 "LOC101927560" ; transcript_alias_1 "ENST00000457273" ; transcript_alias_2 "ENST00000457273.1" ; transcript_alias_3 "RP11-475O6.1-005" ; transcript_alias_4 "OTTHUMT00000027496.1" ; transcript_alias_5 "NONHSAT004171" ; transcript_alias_6 "NR_119374" ; transcript_alias_7 "ENST00000457273.5" ; transcript_alias_8 "NR_119374.1" ;
chr1 lncipedia.org exon 83849907 83850022 . - . gene_id "LINC01725" ; transcript_id "LINC01725:19" ; gene_alias_1 "ENSG00000233008" ; gene_alias_2 "RP11-475O6.1" ; gene_alias_3 "ENSG00000233008.1" ; gene_alias_4 "OTTHUMG00000009930.1" ; gene_alias_5 "ENSG00000233008.5" ; gene_alias_6 "LINC01725" ; gene_alias_7 "LOC101927560" ; transcript_alias_1 "ENST00000457273" ; transcript_alias_2 "ENST00000457273.1" ; transcript_alias_3 "RP11-475O6.1-005" ; transcript_alias_4 "OTTHUMT00000027496.1" ; transcript_alias_5 "NONHSAT004171" ; transcript_alias_6 "NR_119374" ; transcript_alias_7 "ENST00000457273.5" ; transcript_alias_8 "NR_119374.1" ;
chr1 lncipedia.org exon 83860408 83860546 . - . gene_id "LINC01725" ; transcript_id "LINC01725:19" ; gene_alias_1 "ENSG00000233008" ; gene_alias_2 "RP11-475O6.1" ; gene_alias_3 "ENSG00000233008.1" ; gene_alias_4 "OTTHUMG00000009930.1" ; gene_alias_5 "ENSG00000233008.5" ; gene_alias_6 "LINC01725" ; gene_alias_7 "LOC101927560" ; transcript_alias_1 "ENST00000457273" ; transcript_alias_2 "ENST00000457273.1" ; transcript_alias_3 "RP11-475O6.1-005" ; transcript_alias_4 "OTTHUMT00000027496.1" ; transcript_alias_5 "NONHSAT004171" ; transcript_alias_6 "NR_119374" ; transcript_alias_7 "ENST00000457273.5" ; transcript_alias_8 "NR_119374.1" ;
chr16 lncipedia.org exon 74192392 74192726 . - . gene_id "lnc-CLEC18B-3" ; transcript_id "lnc-CLEC18B-3:5" ; gene_alias_1 "ENSG00000249447" ; gene_alias_2 "XLOC_012007" ; gene_alias_3 "linc-ZFHX3-2" ; gene_alias_4 "ENSG00000261404.1" ; gene_alias_5 "AC009120.4" ; gene_alias_6 "OTTHUMG00000176255.2" ; gene_alias_7 "ENSG00000261404.5" ; gene_alias_8 "ENSG00000261404.6" ; gene_alias_9 "AC138627.1" ; gene_alias_10 "LOC101928035" ; transcript_alias_1 "ENST00000510251" ; transcript_alias_2 "TCONS_00024274" ; transcript_alias_3 "ENST00000568137.1" ; transcript_alias_4 "AC009120.4-001" ; transcript_alias_5 "OTTHUMT00000431686.1" ; transcript_alias_6 "NONHSAT143655" ; transcript_alias_7 "NR_104657" ; transcript_alias_8 "NR_104657.1" ;
chr16 lncipedia.org exon 74205905 74206165 . - . gene_id "lnc-CLEC18B-3" ; transcript_id "lnc-CLEC18B-3:5" ; gene_alias_1 "ENSG00000249447" ; gene_alias_2 "XLOC_012007" ; gene_alias_3 "linc-ZFHX3-2" ; gene_alias_4 "ENSG00000261404.1" ; gene_alias_5 "AC009120.4" ; gene_alias_6 "OTTHUMG00000176255.2" ; gene_alias_7 "ENSG00000261404.5" ; gene_alias_8 "ENSG00000261404.6" ; gene_alias_9 "AC138627.1" ; gene_alias_10 "LOC101928035" ; transcript_alias_1 "ENST00000510251" ; transcript_alias_2 "TCONS_00024274" ; transcript_alias_3 "ENST00000568137.1" ; transcript_alias_4 "AC009120.4-001" ; transcript_alias_5 "OTTHUMT00000431686.1" ; transcript_alias_6 "NONHSAT143655" ; transcript_alias_7 "NR_104657" ; transcript_alias_8 "NR_104657.1" ;
chr16 lncipedia.org exon 74210306 74210505 . - . gene_id "lnc-CLEC18B-3" ; transcript_id "lnc-CLEC18B-3:5" ; gene_alias_1 "ENSG00000249447" ; gene_alias_2 "XLOC_012007" ; gene_alias_3 "linc-ZFHX3-2" ; gene_alias_4 "ENSG00000261404.1" ; gene_alias_5 "AC009120.4" ; gene_alias_6 "OTTHUMG00000176255.2" ; gene_alias_7 "ENSG00000261404.5" ; gene_alias_8 "ENSG00000261404.6" ; gene_alias_9 "AC138627.1" ; gene_alias_10 "LOC101928035" ; transcript_alias_1 "ENST00000510251" ; transcript_alias_2 "TCONS_00024274" ; transcript_alias_3 "ENST00000568137.1" ; transcript_alias_4 "AC009120.4-001" ; transcript_alias_5 "OTTHUMT00000431686.1" ; transcript_alias_6 "NONHSAT143655" ; transcript_alias_7 "NR_104657" ; transcript_alias_8 "NR_104657.1" ;
chr16 lncipedia.org exon 74215352 74215521 . - . gene_id "lnc-CLEC18B-3" ; transcript_id "lnc-CLEC18B-3:5" ; gene_alias_1 "ENSG00000249447" ; gene_alias_2 "XLOC_012007" ; gene_alias_3 "linc-ZFHX3-2" ; gene_alias_4 "ENSG00000261404.1" ; gene_alias_5 "AC009120.4" ; gene_alias_6 "OTTHUMG00000176255.2" ; gene_alias_7 "ENSG00000261404.5" ; gene_alias_8 "ENSG00000261404.6" ; gene_alias_9 "AC138627.1" ; gene_alias_10 "LOC101928035" ; transcript_alias_1 "ENST00000510251" ; transcript_alias_2 "TCONS_00024274" ; transcript_alias_3 "ENST00000568137.1" ; transcript_alias_4 "AC009120.4-001" ; transcript_alias_5 "OTTHUMT00000431686.1" ; transcript_alias_6 "NONHSAT143655" ; transcript_alias_7 "NR_104657" ; transcript_alias_8 "NR_104657.1" ;
chrX lncipedia.org exon 130477069 130477249 . - . gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX lncipedia.org exon 130485571 130485631 . - . gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX lncipedia.org exon 130487466 130487557 . - . gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX lncipedia.org exon 130491332 130491454 . - . gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX lncipedia.org exon 130493735 130493830 . - . gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX lncipedia.org exon 130495129 130495206 . - . gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX lncipedia.org exon 130510415 130510841 . - . gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX lncipedia.org exon 130511289 130511388 . - . gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX lncipedia.org exon 130512366 130512479 . - . gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX lncipedia.org exon 130513726 130513802 . - . gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX lncipedia.org exon 130515760 130515849 . - . gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
bottom 20 lines are here.
chr19 lncipedia.org exon 37566031 37566089 . + . gene_id "ZNF571-AS1" ; transcript_id "ZNF571-AS1:20" ; gene_alias_1 "ZNF571-AS1" ; transcript_alias_1 "NR_0382
47.1" ;
chr19 lncipedia.org exon 37569850 37569996 . + . gene_id "ZNF571-AS1" ; transcript_id "ZNF571-AS1:20" ; gene_alias_1 "ZNF571-AS1" ; transcript_alias_1 "NR_0382
47.1" ;
chr19 lncipedia.org exon 37583483 37583834 . + . gene_id "ZNF571-AS1" ; transcript_id "ZNF571-AS1:20" ; gene_alias_1 "ZNF571-AS1" ; transcript_alias_1 "NR_0382
47.1" ;
chr19 lncipedia.org exon 37585339 37587348 . + . gene_id "ZNF571-AS1" ; transcript_id "ZNF571-AS1:20" ; gene_alias_1 "ZNF571-AS1" ; transcript_alias_1 "NR_0382
47.1" ;
chr9 lncipedia.org exon 21994791 21995161 . + . gene_id "CDKN2B-AS1" ; transcript_id "CDKN2B-AS1:74" ; gene_alias_1 "CDKN2B-AS1" ; transcript_alias_1 "NR_1205
36.1" ;
chr9 lncipedia.org exon 22046751 22046900 . + . gene_id "CDKN2B-AS1" ; transcript_id "CDKN2B-AS1:74" ; gene_alias_1 "CDKN2B-AS1" ; transcript_alias_1 "NR_1205
36.1" ;
chr9 lncipedia.org exon 22049106 22049228 . + . gene_id "CDKN2B-AS1" ; transcript_id "CDKN2B-AS1:74" ; gene_alias_1 "CDKN2B-AS1" ; transcript_alias_1 "NR_1205
36.1" ;
chr9 lncipedia.org exon 22120504 22121097 . + . gene_id "CDKN2B-AS1" ; transcript_id "CDKN2B-AS1:74" ; gene_alias_1 "CDKN2B-AS1" ; transcript_alias_1 "NR_1205
36.1" ;
chr7 lncipedia.org exon 134324 134688 . + . gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7 lncipedia.org exon 135068 135176 . + . gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7 lncipedia.org exon 135324 135417 . + . gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7 lncipedia.org exon 167241 167605 . + . gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7 lncipedia.org exon 167985 168093 . + . gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7 lncipedia.org exon 168241 168334 . + . gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7 lncipedia.org exon 174920 175284 . + . gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7 lncipedia.org exon 175664 175772 . + . gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7 lncipedia.org exon 175920 176013 . + . gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr3 lncipedia.org exon 150096443 150096597 . + . gene_id "lnc-RNF13-6" ; transcript_id "lnc-RNF13-6:6" ; gene_alias_1 "LOC105374313" ; transcript_alias_1 "NR_1
36187.1" ;
chr3 lncipedia.org exon 150124938 150125030 . + . gene_id "lnc-RNF13-6" ; transcript_id "lnc-RNF13-6:6" ; gene_alias_1 "LOC105374313" ; transcript_alias_1 "NR_1
36187.1" ;
chr3 lncipedia.org exon 150150511 150151001 . + . gene_id "lnc-RNF13-6" ; transcript_id "lnc-RNF13-6:6" ; gene_alias_1 "LOC105374313" ; transcript_alias_1 "NR_1
36187.1" ;
@venkan The second line, it's not formal. If it's an annotation, it should be headed with a '#',otherwise the computer can't identify whether it's an annotation line or a normal line. So some mistake will happen.
@venkan All in all, it's not a formal gtf file. You can remove the second line or add a '#' at the head of line 2.
@venkan And besides, if convenient, you can contact me by email, my email is yukai@sysucc.org.cn. I don't log in github frequently.
@kodayu I added # in the head of the second line of the gtf file. I am running the Job again and will mail you if there is any error again. thanq
@venkan you are welcome!
@kodayu I have a problem again. Sending the file to your email.
i found the error and fixed it already. A grep
command was rewritten in a wrong format during the previous fixing process. @kodayu @venkan
seem empty result output by htseq-count step