mrmckain / Fast-Plast

Automated de novo assembly of whole chloroplast genomes.
MIT License
43 stars 14 forks source link

Question: orientation of the chloroplast regions #22

Closed PfaffS closed 6 years ago

PfaffS commented 6 years ago

Hi, I use the fast-plast currently for my master thesis and it works pretty good. I just have one question. You say:

"The pipeline then identifies regions from the quadripartite structure of the chloroplast genome, assigns identity, and orders them according to standard convention."

According to other tools the exact relative orientation of the SSC and the LSC cannot be determined. Because the paired short reads oftentimes not completly span the IR region.

So, what is the standart convention for chloroplasts and how do you obtain it?

mrmckain commented 6 years ago

Hi Simon,

LSC and SSC identity are assigned based on relative size of the SC regions.

Orientation is determined by looking at the relative orientation of the rpl and rps genes in the LSC, all genes in the SSC, and the rrn rRNAs in the IR. The code orientates the LSC so there are more "-" strand rps and rpl genes than "+", more "-" strand than "+" strand genes in the SSC, and with rrn genes on the "-" strand for the IRA. This works for most lineages (that we know of) in angiosperms. In reality, the SSC is probably in both directions across copies of the plastome in a plant. This is more for convention than anything else.

Things look like this:

<-------------LSC-------------><-------IRB-------><--------SSC--------><-------IRA-------> <psbA(-)--------------rpl22(-)><------rrn23(+)---><ndhF(-)---ndhD(-)--><------rrn23(-)--->

This method doesn't work with plastomes that deviate from this common orientation. If there are major rearrangements, I do more of this step-wise using the scripts found in the bin directory of the Fast-Plast repository.

--Michael

PfaffS commented 6 years ago

Hi Micheal, Thank you very much for your reply and your explanation, this helps me alot.

Best wishes, Simon