shizhuoxing / BGI-Full-Length-RNA-Analysis-Pipeline

Full-Length RNA Analysis pipeline developted by BGI RD group.
https://github.com/shizhuoxing/BGI-Full-Length-RNA-Analysis-Pipeline
17 stars 4 forks source link

output format for classify_by_primer? #3

Closed lhui2010 closed 3 years ago

lhui2010 commented 3 years ago

Hi Zhuoxing,

Question

After classify_by_primer finished, three files seemed to be generated:

  1. isoseq_flnc.polyAtail.xls
  2. isoseq_nfl.fasta
  3. isoseq_flnc.fasta

The two fasta files were easy to guess from their names, but what is the format of isoseq_flnc.polyAtail.xls? Is there a header or description of each column for it?

Also can you have a brief description on how classify_by_primer works?

Data

My Iso-Seq data was generated from BGI patented multi-transcripts in one ZMW library.

Many thanks, Hui

shizhuoxing commented 3 years ago

Hi Hui,

Sorry about that this repo make too rough, NOT detailed enough. The "isoseq_flnc.polyAtail.xls" have header not be print with classify_by_primer as following: "SeqName\tUMI\tSeqLength\tPolyALength\tPolyAtail". I was plan to publish the algorithm for IsoSeq data processing, isoform & polyA tail detection and quantifying, just not have much detail description on this repo, else, may can refer the methods section of our preprint in single cell isoseq also using the BGI patented RNA linking library preparation method (https://www.biorxiv.org/content/10.1101/2020.07.27.222349v1).

lhui2010 commented 3 years ago

Hi Zhuoxing,

Thanks for the header and the paper link! Quite clear now! Looking forward for your paper! 👍

Best regards, Hui