Closed yuxinghai closed 5 years ago
Hi yuxinghai many thanks for your interest in PAQR.
You can have a look on this ensembl page which outlines all covered species. In the case your organism of interest is included there is a fair chance to also find an annotation file for it (a gtf file).
Hopefully, this helps, cheers,
Ralf
Sorry, you may have misunderstood what I mean. I wan't to create a custome polyA annotation file like PAQR/data/annotation/clusters.hg38.canonical_chr.tandem.noOverlap_strand_specific.bed. Now I have polyA site file including chrom,start,end,tag_number,strand, but I don't know what it mean each columns in clusters.hg38.canonical_chr.tandem.noOverlap_strand_specific.bed. how can I build a file like this?
Hi, now I understand better what you are looking for. Here is a description what the individual columns mean: 1.-6. columns: normal BED guide lines: 1. chromosome 2. start 3. end 4. ID 5. score (number of protocols that support a site, can be anything if you don't plan to filter your sites based on this score) 6. strand information
7.-10. columns: individual entries:
7. and 8. Since we want to look at alternative polyadenylation, we only consider "tandem poly(A) sites", which means the poly(A) sites are located on a single annotated exon. The 7th column contains consecutive numbering of a set of tandem poly(A) sites, the 8th column is the overall number of sites for this set.
9. An identifier for the exon, on which the tandem poly(A) sites are located. The id is composed of:
If you have your set of poly(A) sites, you have to intersect this set with an annotation to infer the above information.
Best Ralf
Thank you, Very detailed explanation. Now I can create this file
hello, I wan't to use PAQR find APA in other species, and I have some polyA site information, how can i get annotation file like mm10,hg38