tderrien / FEELnc

FEELnc : FlExible Extraction of LncRNA
GNU General Public License v3.0
79 stars 28 forks source link

lncRNA gene ID Conversion #39

Closed Tichaboni closed 3 years ago

Tichaboni commented 4 years ago

@tderrien @flegeai @vwucher , I have successfully used your program to Identify lncRNAs from two sets of experiments. For downstream processing I need the identity of the lncRNAs to BLAST against the known lncRNAs and see if I have any novel lncRNAs. The classes text file and candidatelncRNA.gtf.lncRNA.gtf files have TCONS and XLOC ids. Do you have a way of identifying the the corresponding transcript or gene ID?

isBest lncRNA_gene lncRNA_transcript partnerRNA_gene partnerRNA_transcript direction type distance subtype location
1 XLOC_004434 TCONS_00004781 ABRACL NM_001302176 sense genic 0 containing exonic
1 XLOC_000968 TCONS_00001039 OAZ2 NM_001142862 sense genic 0 containing exonic
0 XLOC_000968 TCONS_00001039 ZNF609 NM_001293220 antisense intergenic 1457 convergent downstream
1 XLOC_002571 TCONS_00002801 CMTM7 NM_001007894 sense genic 0 containing exonic
0 XLOC_002571 TCONS_00002801 CMTM8 NM_001199703 sense intergenic 4154 same_strand downstream

Any help provided will be highly appreciated. Thanks

Tichaboni commented 4 years ago

@tderrien @flegeai @vwucher

Hello, Has anyone found a way to deal with the above yet?

tderrien commented 4 years ago

Hello @Tichaboni,

Thank you for your message and sorry to be late in our response. If the known lncRNAs are mapped onto a reference genome, one possibility would be to use bedtools intersect in order to compare FEELnc-lncRNA genomic coordinates with known-lncRNAs genomic coordinates (given a specific threshold for the intersection). If the known lncRNAs are not mapped (e.g. you don't have a reference genome), I'd go for traditional sequence alignement (blast or minimap2) of novel versus known lncRNAs sequences as you suggested. Hope this helps. Thomas