uchicago-bio / 2016-Autumn-Forum

1 stars 0 forks source link

HW #2 Prob. 2-- clarification about output #9

Closed anuvedverma closed 8 years ago

anuvedverma commented 8 years ago

For problem 2, we're asked to read, validate, and output the FASTA sequences in the following format:

Does 'description' refer to the whole line including the identifier, or just the description part after the identifier? For example:

Option 1 (whole line including ID): gi|117320674|gb|AC188433.3| Pan troglodytes BAC clone CH251-677J9 from chromosome 7, complete sequence

versus...

Option 2: Pan troglodytes BAC clone CH251-677J9 from chromosome 7, complete sequence

Thanks for the clarification!

tabinks commented 8 years ago

Yes, use the entire header line for the description.

In practice this is a little overkill, since the only important unique information is the "gi" number (which we will talk about on Monday). The gi number provides a unique id for a given sequence.