Open hw449 opened 6 years ago
The last 4 coordinates correspond to the upstream, genomic and downstream regions. For example, (p1, p2, p3, p4) indicates that the genomic region starts at p2 and ends at p3. p1-p2 is the upstream region and p3-p4 is the downstream region.
Thanks. But in the example, the length of the sequence is over 4000bp, while p3-p2 is only 144bp. Another question is that why does the program need p1 and p4, given that it was only given the sequence between p2 and p3?
We tried to find the CNSs in 10kbp upstream and downstream regions of genes. I will check the example why it is only 144bp. p1 and p4 are the boundaries of upstream and downstream regions.
Sorry to bother you again. In the sample the sequences is 4095 bp . I just don' t know how to get 4095 from the four numbers (8181013, 8181152, 8181296, and 8186236 )
The total length of the sequence is 8186236-8181013=5223. I will take a look at the sample input file why it says 4095.
If you like to discuss any other issues, my email is sbehera@cse.unl.edu
Thanks. In my case, I already have three promoter sequences from three species in a single fasta file, like this:
species1 ATCGAAA......(1000bp) species2 GGATTTT....(1000bp) species3 ATTATTAGG......(1000bp)
Can I just add something like this to the fasta file, if I want the program only do analysis on the first 500 bp?
species1 chr + 1 500 1000 1000 species2 chr + 1 500 1000 1000 species3 chr + 1 500 1000 1000 Dummy chromosomes are added because I don't care about which chromosome do they come from.
In the sample input data, the first line is : Sobic.001G106200 Chr01 + 8181013 8181152 8181296 8186236 I just couldn't understand the meaning of the last four coordinates. Could you please explain that in more detail? Thanks!