srbehera11 / stag-cns

MIT License
6 stars 4 forks source link

Could you please explain the last four fields of the first part in the sample input data in more detail? Thanks. #2

Open hw449 opened 6 years ago

hw449 commented 6 years ago

In the sample input data, the first line is : Sobic.001G106200 Chr01 + 8181013 8181152 8181296 8186236 I just couldn't understand the meaning of the last four coordinates. Could you please explain that in more detail? Thanks!

srbehera11 commented 6 years ago

The last 4 coordinates correspond to the upstream, genomic and downstream regions. For example, (p1, p2, p3, p4) indicates that the genomic region starts at p2 and ends at p3. p1-p2 is the upstream region and p3-p4 is the downstream region.

hw449 commented 6 years ago

Thanks. But in the example, the length of the sequence is over 4000bp, while p3-p2 is only 144bp. Another question is that why does the program need p1 and p4, given that it was only given the sequence between p2 and p3?

srbehera11 commented 6 years ago

We tried to find the CNSs in 10kbp upstream and downstream regions of genes. I will check the example why it is only 144bp. p1 and p4 are the boundaries of upstream and downstream regions.

hw449 commented 6 years ago

Sorry to bother you again. In the sample the sequences is 4095 bp . I just don' t know how to get 4095 from the four numbers (8181013, 8181152, 8181296, and 8186236 )

srbehera11 commented 6 years ago

The total length of the sequence is 8186236-8181013=5223. I will take a look at the sample input file why it says 4095.

If you like to discuss any other issues, my email is sbehera@cse.unl.edu

hw449 commented 6 years ago

Thanks. In my case, I already have three promoter sequences from three species in a single fasta file, like this:

species1 ATCGAAA......(1000bp) species2 GGATTTT....(1000bp) species3 ATTATTAGG......(1000bp)

Can I just add something like this to the fasta file, if I want the program only do analysis on the first 500 bp?

species1 chr + 1 500 1000 1000 species2 chr + 1 500 1000 1000 species3 chr + 1 500 1000 1000 Dummy chromosomes are added because I don't care about which chromosome do they come from.