likelet / LncPipe

A Nextflow-based pipeline for comprehensive analyses of long non-coding RNAs from RNA-seq datasets
GNU Lesser General Public License v3.0
83 stars 43 forks source link

HTseq analysis error reported by Venkatesh #41

Closed likelet closed 4 years ago

likelet commented 5 years ago

seem empty result output by htseq-count step

venkan commented 5 years ago

Main error is:

sambamba view sample.sort.bam > sample.sam # resolved error caused by bam and htseq version conflicts 
  htseq-count -t exon -i gene_id -r pos -f sam sample.sam final_all.gtf > sample.htseq.count 
  rm sample.sam

I dont see any .command.out or .command.err files, also checked the sample.htseq.count file and it is not empty. I see the first column with gene names and second column with raw counts.

A1BG    112
A1CF    17
A2M 70
A2ML1   6
A2ML1-AS-1  286
A3GALT2 6
A4GALT  2
A4GNT   3
AAAS    0
AACS    2
likelet commented 5 years ago

Yes, this means that the sample.htseq.count has been generated.
plz try rerun the analysis by adding a -resume and to see wether the error still exists.

venkan commented 5 years ago

yes , this means that the sample.htseq.count has been generated.

Yes you are right. But the JOB was stopped due to the error.

ERROR ~ Error executing process > 'Run_htseq_for_quantification (sample)'

Caused by:
  Missing output file(s) `sample.htseq.count ` expected by process `Run_htseq_for_quantification (sample)`
likelet commented 5 years ago

this is weird, it shouldn't be an error here. try using the latest Nextflow?

venkan commented 5 years ago

this is weird, it shouldn't be an error here. try using the latest Nextflow?

I resumed the JOB and its going on. will check this and then will use latest one.

venkan commented 5 years ago

I got the same error again. Do you think the latest Nextflow will be fine without this error? And is it this https://github.com/nf-core/lncpipe/tree/dev the latest one?

likelet commented 5 years ago

I got the same error again. Do you think the latest Nextflow will be fine without this error? And is it this https://github.com/nf-core/lncpipe/tree/dev the latest one?

I mean the latest nextflow. not the lncpipe. and the "https://github.com/nf-core/lncpipe/tree/dev" is not the latest version of lncPipe but just an adjusted version for the nf-core community, which provide better code structure for conventional pipelines. The lncPipe in this repo is still the latest one that has been tested locally. BTW, I am not sure whether it would be fixed by changing the NextFlow binary file. But the code should be fine in this case. If the problem still exists, you can pull the count file out and try combining the result manually by using the scripts in the bin folder.
a proper code for achieving this is as follow

perl !{baseDir}/bin/get_map_table.pl  final_all.gtf  > map.file
        R CMD BATCH !{baseDir}/bin/get_htseq_matrix.R

and the !{baseDir} is the home path of lncPipe

When this step was done you can run the lncPipereporter by code

library(LncPipeReporter)
run_reporter(input = "./",
             output = 'reporter.html',
             theme = 'npg',
             cdf.percent = 10,
             max.lncrna.len = 10000,
             min.expressed.sample = 50,
             ask = FALSE)

it will automately dectect files in the current folder to generate the report.

likelet commented 5 years ago

I saw your reply in Github. So, final_all gtf also have novel lncRNAs? Am I Right? perl lncPipe/bin/get_map_table.pl final_all.gtf > map.file final_all.gtf is it the one in the folder where STB79.sort.bam is there? Sorry I am a bit confused. And interested to know why I see that Error? Where Can I find latest nextflow binary file?

yes final_all.gtf also contains novel lncRNAs , it also could be found from result folder. i have no idea right now that why the errors come out, and the latest nexflow binary could be found from

https://www.nextflow.io/

venkan commented 5 years ago

Ok. Anyways I see that *.sort.bam files are in Result/hisat_alignment. So, with featurecounts tool using final_all.gtf and *.sort.bam files I extracted the raw counts. Please check the attachment for one of the sample sending the counts for few genes sample_counts.txt [last column in the file].

In the first column I can see all the protein coding genes, I can see some novel lncRNAs which are named based on the closet protein coding gene.

I also see the known lncRNAs are also given the names based on closet protein coding gene? How do I get their original names? As I said before it will be very helpful if you could provide a table with Name-old name for users.

likelet commented 5 years ago

yes, we have updated the code enabling generated a mapfile as you suggested, which could be found in the result folder where the final.gtf be placed.

venkan commented 5 years ago

Sorry I dont have any map.file in Result folder. I see only hisat_alignment, Identified_lncRNA, Merged_assemblies folders in the Result folder.

As I dont see any map.file, I did like below:

perl bin/get_map_table.pl --gtf_file=Result/Identified_lncRNA/final_all.gtf > map.file

And in the map.file I see protein coding genes and its transcript ids. I also see known and novel lncRNAs. I dont see any Name-oldname for known lncRNAs in the map.file. Please check the attachment. Sending you the file mapfile.txt

likelet commented 5 years ago

the code should be

perl !{baseDir}/bin/get_map_table.pl  Result/Identified_lncRNA/final_all.gtf  > map.file

and i can't download the mapfile.txt due to a bad network, can you plz paste the top 100 lines in the mapfile here.

venkan commented 5 years ago

In the folder lncpipe I have all the directories like bin, Combined_annotations, docs, lncpipe.image, nextflow, nextflow.config etc.....

I'm inside lncpipe folder and gave the following command to get map.file

perl bin/get_map_table.pl Result/Identified_lncRNA/final_all.gtf > map.file

When I opened map.file I see the following:

Required parameters missing
Usage:  get_map_table.pl --gtf_file=final_all.gtf 

So, I gave the command again with --gtf_file

perl bin/get_map_table.pl --gtf_file=Result/Identified_lncRNA/final_all.gtf > map.file

Now I can see three columns in map.file. I am sending an other file which has top 100 lines of map.file. top100_map_file.txt

And a small suggestion. May be you can also use tools like Pfam, RNAFold, Infernal/cmscan which filter out RNAs based on sequence and secondary structure and gives novel lncRNAs.

likelet commented 5 years ago

Thanks for your suggestion, we are working on the updated version of lncPipe by adding the features from to do list. Your suggestion has already been added into to do list for further implementation

venkan commented 5 years ago

Thanks for your suggestion, we are working on the updated version of lncPipe by adding the features from to do list. Your suggestion has already been added into to do list for further implementation

And may I know what about map.file please

likelet commented 5 years ago

the first column is the name in final_all.gtf, and the second column is the corresponding name in database like ensemble or lncpidia. the third column presents the type of gene: coding gene, known lncRNA or novel ones.

venkan commented 5 years ago

the first column is the name in final_all.gtf, and the second column is the corresponding name in database like ensemble or lncpidia. the third column presents the type of gene: coding gene, known lncRNA or novel ones.

Yes I know that. Sorry I have checked the lncpedia and also ensemble. I didn't find any known lncRNAs with those names.

For example:

LINC-EFNA5-21 LINC-EFNA5-21:1 known LINC-FP236240.1-2 LINC-FP236240.1-2:1 known LINC-SALL1-10 LINC-SALL1-10:1 known

I haven't seen any of these lncRNAs in the table of lncpedia or ensemble. Could you please check again once. All protein coding are fine. But why known lncRNAs looks like that.

May I know which version of lncpedia you are using?

kodayu commented 5 years ago

Hello, LINC-EFNA5-21 LINC-EFNA5-21:1 known LINC-FP236240.1-2 LINC-FP236240.1-2:1 known LINC-SALL1-10 LINC-SALL1-10:1 known

These names are newly defined by LncPipe, we have generated a mapping file in Result/Identified_lncRNA/lncRNA.mapping.file. The first column is the new name after StringTie merge, and the second column is the newly defined ID by LncPipe. the third and forth columns are the corresponding names in database like ensemble or lncpidia.

MSTRG.1 LINC-PLCXD1-1 ENSG00000228572.7
MSTRG.10 LINC-PPP2R3B-5 lnc-PLCXD1-3 MSTRG.1002 LINC-ITIH6-1 ENSG00000215197.4
MSTRG.1005 LINC-MAGED2-1 lnc-ITIH6-1 MSTRG.1006 LINC-MAGED2-2 ENSG00000275387.1
MSTRG.1011 LINC-ALAS2-1 ENSG00000278283.1
MSTRG.1012 LINC-PAGE2B-1 ENSG00000234466.1 lnc-ALAS2-1 MSTRG.1015 LINC-PAGE2-1 ENSG00000278319.1
MSTRG.1016 LINC-FAM104B-1 ENSG00000276929.1
MSTRG.1018 LINC-MTRNR2L10-2 ENSG00000186678.7 lnc-PAGE5-2 MSTRG.1019 LINC-MTRNR2L10-1 ENSG00000229760.1 lnc-MTRNR2L10-3

If you use final_all.gtf to generage mapping file, you won't find the name in database like ensemble or lncpidia because we have a rename procedure.

venkan commented 5 years ago

Hello, LINC-EFNA5-21 LINC-EFNA5-21:1 known LINC-FP236240.1-2 LINC-FP236240.1-2:1 known LINC-SALL1-10 LINC-SALL1-10:1 known

These names are newly defined by LncPipe, we have generated a mapping file in Result/Identified_lncRNA/lncRNA.mapping.file. The first column is the new name after StringTie merge, and the second column is the newly defined ID by LncPipe. the third and forth columns are the corresponding names in database like ensemble or lncpidia.

MSTRG.1 LINC-PLCXD1-1 ENSG00000228572.7 MSTRG.10 LINC-PPP2R3B-5 lnc-PLCXD1-3 MSTRG.1002 LINC-ITIH6-1 ENSG00000215197.4 MSTRG.1005 LINC-MAGED2-1 lnc-ITIH6-1 MSTRG.1006 LINC-MAGED2-2 ENSG00000275387.1 MSTRG.1011 LINC-ALAS2-1 ENSG00000278283.1 MSTRG.1012 LINC-PAGE2B-1 ENSG00000234466.1 lnc-ALAS2-1 MSTRG.1015 LINC-PAGE2-1 ENSG00000278319.1 MSTRG.1016 LINC-FAM104B-1 ENSG00000276929.1 MSTRG.1018 LINC-MTRNR2L10-2 ENSG00000186678.7 lnc-PAGE5-2 MSTRG.1019 LINC-MTRNR2L10-1 ENSG00000229760.1 lnc-MTRNR2L10-3

If you use final_all.gtf to generage mapping file, you won't find the name in database like ensemble or lncpidia because we have a rename procedure.

Thanks a lot for the answer. But I have to say that in Result/Identified_lncRNA/ I dont see any lncRNA.mapping.file. I see only the below files inside Identified_lncRNA folder.

Screen Shot 2019-07-15 at 10 50 13

How can I get the lncRNA.mapping.file now?

kodayu commented 5 years ago

Have you ever updated LncPipe? We have added a function to generate this file last year in rename_lncRNA_2.pl.

kodayu commented 5 years ago

You can re-run the commond at line 1162 of LncRNAanalysisPipe.nf. Of course, you should change DIR into the work DIR of this command. You can find the work DIR by this command: find ./ -name lncRNA.final.v2.gtf

venkan commented 5 years ago

Have you ever updated LncPipe? We have added a function to generate this file last year in rename_lncRNA_2.pl.

The Lncpipe I'm using is from last year September. which is the updated one? Is it this https://github.com/nf-core/lncpipe/tree/dev

kodayu commented 5 years ago

We have updated it, and this one is the updated one. The former version is removed, you can simply download it by git.

venkan commented 5 years ago

We have updated it, and this one is the updated one. The former version is removed, you can simply download it by git.

Yes I see that. Will update that and rerun my analysis. I will keep you posted. Thanks a lot (y)

venkan commented 5 years ago

Hi Kodayu,

I tried with updated version. And I see there is an error with rename_lncRNA_2.pl script.

ERROR ~ Error executing process > 'Summary_renaming_and_classification (1)'

Caused by:
  Process `Summary_renaming_and_classification (1)` terminated with an error exit status (1)

Command exit status:
  1

Command output:
  (empty)

Command error:
  Use of uninitialized value $out_gene in concatenation (.) or string at /path/to/LncPipe/bin/rename_lncRNA_2.pl line 274, <FH> line 3471258.

Along with that I also see an error with hisat2 alignment. I'm running the pipeline on 6 samples. Among them three worked and three samples didn't work for alignment.

[d1/eee460] process > fastq_hisat2_alignment_For_discovery [100%] 6 of 6, failed: 3

May I know what could be the problem here? thanq

venkan commented 5 years ago

Hi Kodayu and Zhaoqi,

May I know something about my previous comment please. I made few changes in the perl script after I got the error...but I didn't find anything useful.

likelet commented 5 years ago
fastq_hisat2_alignment_For_discovery [100%] 6 of 6, failed: 3

I do not know why there are errors in hisat2 alignment step. This should not happen if three of them have been completed in your case. Maybe you can check the resource limits for the analysis, I remember that you went through this step before with the current lncPipe version but not the dev version in nf-core. you can extract the rename_lncRNA_2.pl script in dev, and run it in the previous result manually..

sorry for the inconvenience of the script.

venkan commented 5 years ago

ok As you said I replaced the rename_lncRNA_2.pl script from dev. NO change in the error. I have the same error again.

ERROR ~ Error executing process > 'Summary_renaming_and_classification (1)'

Caused by:
  Process `Summary_renaming_and_classification (1)` terminated with an error exit status (1)

Command exit status:
  1

Command output:
  (empty)

Command error:
  Use of uninitialized value $out_gene in concatenation (.) or string at /path/to/LncPipe/bin/rename_lncRNA_2.pl line 274, <FH> line 3471258.
likelet commented 5 years ago

Does the analysis on test dataset run any error message? for now, I am no idea about your errors, I think the gtf file might be malformed in this case. Maybe a fast solution is that I can give you remote assistance by anyDesk?

likelet commented 5 years ago

@venkan i have fixed script, plz try again.

venkan commented 5 years ago

@likelet thanq. I will give a try and will let you know

venkan commented 5 years ago

@likelet I see an Error again from the script.

ERROR ~ Error executing process > 'Summary_renaming_and_classification (1)'

Caused by:
  Process `Summary_renaming_and_classification (1)` terminated with an error exit status (1)

Command exit status:
  1

Command output:
  (empty)

Command error:
  Use of uninitialized value $field[2] in string ne at /documents/lncpipe/new_LncPipe/bin/rename_lncRNA_2.pl line 48, <FH> line 2883588.
likelet commented 5 years ago

it seems that the input file is not separated by tab.

venkan commented 5 years ago

but which input file? genome ref? gencode annotation? LNCipedia gene annotation file? which one?

likelet commented 5 years ago

@kodayu which file ??

venkan commented 5 years ago

@likelet @kodayu Any help please. Is the error about some input file? If so which one?

kodayu commented 5 years ago

@venkan I am sorry for not logging in my account for a long time. I think the problem is that the lncipedia_mod.gtf is abnormal. In brief, it's the second input file of the script rename_lncRNA_2.pl. I suggest that you provide the top 20 and bottom 20 lines of this file for me. And I will check out the reason.

venkan commented 5 years ago

@kodayu Here it is for top 20 lines of script rename_lncRNA_2.pl.

head -20 rename_lncRNA_2.pl

#!/usr/bin/perl -w
use strict;
#die ("usage: <Genecode gtf file> <lncipedia gtf file>") unless @ARGV > 2;
#print "#Query file ".$ARGV[0]." with file_number ".$ARGV[1]."\n";

my %know_lnc;
open FH,"known.lncRNA.bed" or die;
while(<FH>){
    chomp;
    my @field=split "\t";
    if ($field[7] eq "exon"){
        $know_lnc{$field[0].'\t'.$field[1].'\t'.$field[5]} = $field[3];
        $know_lnc{$field[0].'\t'.$field[2].'\t'.$field[5]} = $field[3];
    }
}

my %genecode;my %lncpedia;
if (@ARGV == 2){
open FH,"$ARGV[0]" or die;

And bottom 20 lines is here.

}
my %all_data;
foreach my $mstr(sort(keys %genecode)){
        $all_data{$mstr} = 1;
}
foreach my $mstr(sort(keys %lncpedia)){
        $all_data{$mstr} = 1;
}
open OUT3,">lncRNA.mapping.file" or die;
foreach my $mstr(sort(keys %all_data)){
        if(defined($MSTRG2genename{$mstr}) && defined($lncpedia{$mstr}) && defined($genecode{$mstr})){
                print OUT3 $mstr."\t".$MSTRG2genename{$mstr}."\t".$genecode{$mstr}."\t".$lncpedia{$mstr}."\n";
        }elsif (defined($MSTRG2genename{$mstr}) && defined($genecode{$mstr})){
                print OUT3 $mstr."\t".$MSTRG2genename{$mstr}."\t".$genecode{$mstr}."\t\n"
        }elsif (defined($MSTRG2genename{$mstr}) && defined($lncpedia{$mstr})){
                print OUT3 $mstr."\t".$MSTRG2genename{$mstr}."\t\t".$lncpedia{$mstr}."\n"
        }else{
                next;
        }
}
kodayu commented 5 years ago

@venkan Sorry to mislead you, I mean the lncipedia_mod.gtf, the input of rename_lncRNA_2.pl.

kodayu commented 5 years ago

@venkan The code is perl !{baseDir}/bin/rename_lncRNA_2.pl gencode_annotation_gtf_mod.gtf lncipedia_mod.gtf , I need the top and bottom lines of the last file.

venkan commented 5 years ago

@kodayu top 20 lines of lncipedia_mod.gtf.

##gtf
track name='LNCipedia' description='LNCipedia 5.0 - www.lncipedia.org' color=73,157,74 db=hg38 url='http://lncipedia.org/db/transcript/$$'
chr1    lncipedia.org   exon    83801516    83803251    .   -   .   gene_id "LINC01725" ; transcript_id "LINC01725:19" ; gene_alias_1 "ENSG00000233008" ; gene_alias_2 "RP11-475O6.1" ; gene_alias_3 "ENSG00000233008.1" ; gene_alias_4 "OTTHUMG00000009930.1" ; gene_alias_5 "ENSG00000233008.5" ; gene_alias_6 "LINC01725" ; gene_alias_7 "LOC101927560" ; transcript_alias_1 "ENST00000457273" ; transcript_alias_2 "ENST00000457273.1" ; transcript_alias_3 "RP11-475O6.1-005" ; transcript_alias_4 "OTTHUMT00000027496.1" ; transcript_alias_5 "NONHSAT004171" ; transcript_alias_6 "NR_119374" ; transcript_alias_7 "ENST00000457273.5" ; transcript_alias_8 "NR_119374.1" ;
chr1    lncipedia.org   exon    83849907    83850022    .   -   .   gene_id "LINC01725" ; transcript_id "LINC01725:19" ; gene_alias_1 "ENSG00000233008" ; gene_alias_2 "RP11-475O6.1" ; gene_alias_3 "ENSG00000233008.1" ; gene_alias_4 "OTTHUMG00000009930.1" ; gene_alias_5 "ENSG00000233008.5" ; gene_alias_6 "LINC01725" ; gene_alias_7 "LOC101927560" ; transcript_alias_1 "ENST00000457273" ; transcript_alias_2 "ENST00000457273.1" ; transcript_alias_3 "RP11-475O6.1-005" ; transcript_alias_4 "OTTHUMT00000027496.1" ; transcript_alias_5 "NONHSAT004171" ; transcript_alias_6 "NR_119374" ; transcript_alias_7 "ENST00000457273.5" ; transcript_alias_8 "NR_119374.1" ;
chr1    lncipedia.org   exon    83860408    83860546    .   -   .   gene_id "LINC01725" ; transcript_id "LINC01725:19" ; gene_alias_1 "ENSG00000233008" ; gene_alias_2 "RP11-475O6.1" ; gene_alias_3 "ENSG00000233008.1" ; gene_alias_4 "OTTHUMG00000009930.1" ; gene_alias_5 "ENSG00000233008.5" ; gene_alias_6 "LINC01725" ; gene_alias_7 "LOC101927560" ; transcript_alias_1 "ENST00000457273" ; transcript_alias_2 "ENST00000457273.1" ; transcript_alias_3 "RP11-475O6.1-005" ; transcript_alias_4 "OTTHUMT00000027496.1" ; transcript_alias_5 "NONHSAT004171" ; transcript_alias_6 "NR_119374" ; transcript_alias_7 "ENST00000457273.5" ; transcript_alias_8 "NR_119374.1" ;
chr16   lncipedia.org   exon    74192392    74192726    .   -   .   gene_id "lnc-CLEC18B-3" ; transcript_id "lnc-CLEC18B-3:5" ; gene_alias_1 "ENSG00000249447" ; gene_alias_2 "XLOC_012007" ; gene_alias_3 "linc-ZFHX3-2" ; gene_alias_4 "ENSG00000261404.1" ; gene_alias_5 "AC009120.4" ; gene_alias_6 "OTTHUMG00000176255.2" ; gene_alias_7 "ENSG00000261404.5" ; gene_alias_8 "ENSG00000261404.6" ; gene_alias_9 "AC138627.1" ; gene_alias_10 "LOC101928035" ; transcript_alias_1 "ENST00000510251" ; transcript_alias_2 "TCONS_00024274" ; transcript_alias_3 "ENST00000568137.1" ; transcript_alias_4 "AC009120.4-001" ; transcript_alias_5 "OTTHUMT00000431686.1" ; transcript_alias_6 "NONHSAT143655" ; transcript_alias_7 "NR_104657" ; transcript_alias_8 "NR_104657.1" ;
chr16   lncipedia.org   exon    74205905    74206165    .   -   .   gene_id "lnc-CLEC18B-3" ; transcript_id "lnc-CLEC18B-3:5" ; gene_alias_1 "ENSG00000249447" ; gene_alias_2 "XLOC_012007" ; gene_alias_3 "linc-ZFHX3-2" ; gene_alias_4 "ENSG00000261404.1" ; gene_alias_5 "AC009120.4" ; gene_alias_6 "OTTHUMG00000176255.2" ; gene_alias_7 "ENSG00000261404.5" ; gene_alias_8 "ENSG00000261404.6" ; gene_alias_9 "AC138627.1" ; gene_alias_10 "LOC101928035" ; transcript_alias_1 "ENST00000510251" ; transcript_alias_2 "TCONS_00024274" ; transcript_alias_3 "ENST00000568137.1" ; transcript_alias_4 "AC009120.4-001" ; transcript_alias_5 "OTTHUMT00000431686.1" ; transcript_alias_6 "NONHSAT143655" ; transcript_alias_7 "NR_104657" ; transcript_alias_8 "NR_104657.1" ;
chr16   lncipedia.org   exon    74210306    74210505    .   -   .   gene_id "lnc-CLEC18B-3" ; transcript_id "lnc-CLEC18B-3:5" ; gene_alias_1 "ENSG00000249447" ; gene_alias_2 "XLOC_012007" ; gene_alias_3 "linc-ZFHX3-2" ; gene_alias_4 "ENSG00000261404.1" ; gene_alias_5 "AC009120.4" ; gene_alias_6 "OTTHUMG00000176255.2" ; gene_alias_7 "ENSG00000261404.5" ; gene_alias_8 "ENSG00000261404.6" ; gene_alias_9 "AC138627.1" ; gene_alias_10 "LOC101928035" ; transcript_alias_1 "ENST00000510251" ; transcript_alias_2 "TCONS_00024274" ; transcript_alias_3 "ENST00000568137.1" ; transcript_alias_4 "AC009120.4-001" ; transcript_alias_5 "OTTHUMT00000431686.1" ; transcript_alias_6 "NONHSAT143655" ; transcript_alias_7 "NR_104657" ; transcript_alias_8 "NR_104657.1" ;
chr16   lncipedia.org   exon    74215352    74215521    .   -   .   gene_id "lnc-CLEC18B-3" ; transcript_id "lnc-CLEC18B-3:5" ; gene_alias_1 "ENSG00000249447" ; gene_alias_2 "XLOC_012007" ; gene_alias_3 "linc-ZFHX3-2" ; gene_alias_4 "ENSG00000261404.1" ; gene_alias_5 "AC009120.4" ; gene_alias_6 "OTTHUMG00000176255.2" ; gene_alias_7 "ENSG00000261404.5" ; gene_alias_8 "ENSG00000261404.6" ; gene_alias_9 "AC138627.1" ; gene_alias_10 "LOC101928035" ; transcript_alias_1 "ENST00000510251" ; transcript_alias_2 "TCONS_00024274" ; transcript_alias_3 "ENST00000568137.1" ; transcript_alias_4 "AC009120.4-001" ; transcript_alias_5 "OTTHUMT00000431686.1" ; transcript_alias_6 "NONHSAT143655" ; transcript_alias_7 "NR_104657" ; transcript_alias_8 "NR_104657.1" ;
chrX    lncipedia.org   exon    130477069   130477249   .   -   .   gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX    lncipedia.org   exon    130485571   130485631   .   -   .   gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX    lncipedia.org   exon    130487466   130487557   .   -   .   gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX    lncipedia.org   exon    130491332   130491454   .   -   .   gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX    lncipedia.org   exon    130493735   130493830   .   -   .   gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX    lncipedia.org   exon    130495129   130495206   .   -   .   gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX    lncipedia.org   exon    130510415   130510841   .   -   .   gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX    lncipedia.org   exon    130511289   130511388   .   -   .   gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX    lncipedia.org   exon    130512366   130512479   .   -   .   gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX    lncipedia.org   exon    130513726   130513802   .   -   .   gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;
chrX    lncipedia.org   exon    130515760   130515849   .   -   .   gene_id "lnc-GPR119-1" ; transcript_id "lnc-GPR119-1:1" ; gene_alias_1 "ENSG00000229702" ; gene_alias_2 "RP1-274L7.1" ; gene_alias_3 "ENSG00000229702.1" ; gene_alias_4 "OTTHUMG00000022398.1" ; transcript_alias_1 "ENST00000458525" ; transcript_alias_2 "ENST00000458525.1" ; transcript_alias_3 "RP1-274L7.1-001" ; transcript_alias_4 "OTTHUMT00000058271.1" ; transcript_alias_5 "NONHSAT138520" ;

bottom 20 lines are here.

chr19   lncipedia.org   exon    37566031        37566089        .   +   .   gene_id "ZNF571-AS1" ; transcript_id "ZNF571-AS1:20" ; gene_alias_1 "ZNF571-AS1" ; transcript_alias_1 "NR_0382
47.1" ;
chr19   lncipedia.org   exon    37569850        37569996        .   +   .   gene_id "ZNF571-AS1" ; transcript_id "ZNF571-AS1:20" ; gene_alias_1 "ZNF571-AS1" ; transcript_alias_1 "NR_0382
47.1" ;
chr19   lncipedia.org   exon    37583483        37583834        .   +   .   gene_id "ZNF571-AS1" ; transcript_id "ZNF571-AS1:20" ; gene_alias_1 "ZNF571-AS1" ; transcript_alias_1 "NR_0382
47.1" ;
chr19   lncipedia.org   exon    37585339        37587348        .   +   .   gene_id "ZNF571-AS1" ; transcript_id "ZNF571-AS1:20" ; gene_alias_1 "ZNF571-AS1" ; transcript_alias_1 "NR_0382
47.1" ;
chr9    lncipedia.org   exon    21994791        21995161        .   +   .   gene_id "CDKN2B-AS1" ; transcript_id "CDKN2B-AS1:74" ; gene_alias_1 "CDKN2B-AS1" ; transcript_alias_1 "NR_1205
36.1" ;
chr9    lncipedia.org   exon    22046751        22046900        .   +   .   gene_id "CDKN2B-AS1" ; transcript_id "CDKN2B-AS1:74" ; gene_alias_1 "CDKN2B-AS1" ; transcript_alias_1 "NR_1205
36.1" ;
chr9    lncipedia.org   exon    22049106        22049228        .   +   .   gene_id "CDKN2B-AS1" ; transcript_id "CDKN2B-AS1:74" ; gene_alias_1 "CDKN2B-AS1" ; transcript_alias_1 "NR_1205
36.1" ;
chr9    lncipedia.org   exon    22120504        22121097        .   +   .   gene_id "CDKN2B-AS1" ; transcript_id "CDKN2B-AS1:74" ; gene_alias_1 "CDKN2B-AS1" ; transcript_alias_1 "NR_1205
36.1" ;
chr7    lncipedia.org   exon    134324  134688  .   +   .   gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7    lncipedia.org   exon    135068  135176  .   +   .   gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7    lncipedia.org   exon    135324  135417  .   +   .   gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7    lncipedia.org   exon    167241  167605  .   +   .   gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7    lncipedia.org   exon    167985  168093  .   +   .   gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7    lncipedia.org   exon    168241  168334  .   +   .   gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7    lncipedia.org   exon    174920  175284  .   +   .   gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7    lncipedia.org   exon    175664  175772  .   +   .   gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr7    lncipedia.org   exon    175920  176013  .   +   .   gene_id "lnc-FAM20C-1" ; transcript_id "lnc-FAM20C-1:6" ; gene_alias_1 "LOC105375115" ; transcript_alias_1 "NR_134324.1" ;
chr3    lncipedia.org   exon    150096443   150096597   .   +   .   gene_id "lnc-RNF13-6" ; transcript_id "lnc-RNF13-6:6" ; gene_alias_1 "LOC105374313" ; transcript_alias_1 "NR_1
36187.1" ;
chr3    lncipedia.org   exon    150124938   150125030   .   +   .   gene_id "lnc-RNF13-6" ; transcript_id "lnc-RNF13-6:6" ; gene_alias_1 "LOC105374313" ; transcript_alias_1 "NR_1
36187.1" ;
chr3    lncipedia.org   exon    150150511   150151001   .   +   .   gene_id "lnc-RNF13-6" ; transcript_id "lnc-RNF13-6:6" ; gene_alias_1 "LOC105374313" ; transcript_alias_1 "NR_1
36187.1" ;
kodayu commented 5 years ago

@venkan The second line, it's not formal. If it's an annotation, it should be headed with a '#',otherwise the computer can't identify whether it's an annotation line or a normal line. So some mistake will happen.

kodayu commented 5 years ago

@venkan All in all, it's not a formal gtf file. You can remove the second line or add a '#' at the head of line 2.

kodayu commented 5 years ago

@venkan And besides, if convenient, you can contact me by email, my email is yukai@sysucc.org.cn. I don't log in github frequently.

venkan commented 5 years ago

@kodayu I added # in the head of the second line of the gtf file. I am running the Job again and will mail you if there is any error again. thanq

kodayu commented 5 years ago

@venkan you are welcome!

venkan commented 5 years ago

@kodayu I have a problem again. Sending the file to your email.

likelet commented 5 years ago

i found the error and fixed it already. A grep command was rewritten in a wrong format during the previous fixing process. @kodayu @venkan