japonicusdb / japonicus-config

Configuration for JaponicusDB
0 stars 1 forks source link

japonicus transposons and LTRs #28

Closed ValWood closed 3 years ago

ValWood commented 3 years ago

This is the list from the publication, but I don't think these are annotated in the sequence:

(Deleted table See new table below with stand info.)

kimrutherford commented 3 years ago

Do these need to be added to the contig files?

ValWood commented 3 years ago

Yes please. At least then we will be able o see the features in the. browser to discuss further. I also need to contact Henry Levin when this is done to see if we can improve (on my to do list)

kimrutherford commented 3 years ago

Some of the features in the list (like "Tj10-1") aren't marked as LTR or retrotransposon. Which SO term should I use for those?

ValWood commented 3 years ago

The S. japonicus genome harbors 10 families of gypsy-type retrotransposons (4) (Figure S5 and Table S6).

japonicus_transposons.txt (has strand information)

TJ1-10 etc appear to refer to the full length transposons so, we can assign all transposons (complete or degraded) to http://www.sequenceontology.org/browser/current_svn/term/SO:0000186 (LTR retotransposon)

and any lone LTRs to http://www.sequenceontology.org/browser/current_svn/term/SO:0000286

Does that make sense?

ValWood commented 3 years ago

I don't think we need Henry for anything, but I will let him know anyway.

ValWood commented 3 years ago

My response to Henry: Hi Henry,

We do not yet have feature pages for non-gene features. We have an open ticket about this https://github.com/pombase/website/issues/37#issuecomment-876590279

The pombe full length transposons are visible only because the are annotated as CDS (i.e a protein coding gene feature). I could do the same for the japonicus full length transposons, but only if they contain a single open reading frame. I am not sure that this is the case? At least I could not see a large orf in the ones I looked at. I will know better once these features are flagged in the contigs.

Also, if we want to be able to search on the degraded transposons, we will need to include non-gene features. I think for the first stage we will just get them to be visible in JBrowse (they will be searchable from there). We can explain this in the japonics FAQ once that is established. We can than aim to get the "feature" pages for these later (likely not for the first release).

ValWood commented 3 years ago

Note to self, tell Frank

kimrutherford commented 3 years ago

Hi Val. You've hidden two comments about adding these features to the contig files. Has there been a change of plan?

ValWood commented 3 years ago

No sorry. I was confused.

The source is japonicus_transposons.txt above. They do need adding to the contigs. v

kimrutherford commented 3 years ago

I'll do this at the same time as japonicusdb/japonicus-config#39

ValWood commented 3 years ago

Although we don't need this for release it will be useful for me to be able to see the. transposons while predicting and refining gene structures. Will also be required for publication, so please keep near the top of the list.

kimrutherford commented 3 years ago

What IDs should I use for the new features?

ValWood commented 3 years ago

SJATN_00001 - n (i.e swapping the G fo gene for TN for transposon) @snezhkaoliferenko @mah11 does that sound OK?

kimrutherford commented 3 years ago

Hi Val. I'm working on this now. I wanted to double check which EMBL feature types to use. For pombe we use misc_feature for SO:0000186 and LTR for SO:0000286:

FT   misc_feature    complement(777729..782654)
FT                   /SO="SO:0000186"
FT                   /note="LTR-retrotransposon"
FT   LTR             complement(777734..778082)
FT                   /systematic_id="SPLTRC.26"
FT                   /SO="SO:0000286"

Is that the plan for japonicus too?

kimrutherford commented 3 years ago

Here's a sample of the output of my dodgy script:

FT   LTR             complement(1824730..1824873)
FT                   /systematic_id="SJATN_00025"
FT                   /note="Tj1-type LTR"
FT                   /SO="SO:0000286"
FT   misc_feature    1825097..1825578
FT                   /systematic_id="SJATN_00026"
FT                   /note="Tj6 partial retrotransposon"
FT                   /SO="SO:0000186"
FT   LTR             1825579..1825789
FT                   /systematic_id="SJATN_00027"
FT                   /note="Tj7-type LTR"
FT                   /SO="SO:0000286"

Does that look OK?

ValWood commented 3 years ago

I wanted to double check which EMBL feature types to use. For pombe we use misc_feature for SO:0000186 and LTR for SO:0000286:

I think so- I assume SO:0000186 is tranposon. I tried to look it up but the Miso browser has disappeared! http://www.sequenceontology.org/browser/current_svn/term/SO:0000685 I presume this also means that our links to SO are broken!

EMBL only has a ~qualifier~ for LTR but not one for transposon (unless it was added recently...will check)

ValWood commented 3 years ago

Here's a sample of the output of my dodgy script:

Sample output looks perfect!

ValWood commented 3 years ago

EMBL only has a qualifier for LTR but not one for transposon (unless it was added recently...will check) Actually EMBL no longer seems to have a key for LTR (Iat least I can't see it...) But we will keep ours, we can map to misc_feature for submission.

kimrutherford commented 3 years ago

From the Zoom call:

For retrotransposons use: mobile_element with /mobile_element_type="retrotransposon"

For LTRs use: repeat_region with /rpt_type="long_terminal_repeat"

kimrutherford commented 3 years ago

Here's the updated output from my dodgy script:

FT   mobile_element  complement(1841270..1841781)
FT                   /systematic_id="SJATN_00037"
FT                   /note="Tj6 partial retrotransposon"
FT                   /SO="SO:0000186"
FT                   /mobile_element_type="retrotransposon"
FT   repeat_region   1841974..1842117
FT                   /systematic_id="SJATN_00038"
FT                   /note="Tj1-type LTR"
FT                   /SO="SO:0000286"
FT                   /rpt_type="long_terminal_repeat"

Should I go ahead and include these new features in the contig files?

ValWood commented 3 years ago

Yes please!

kimrutherford commented 3 years ago

Done! Is there anything else to do here?

kimrutherford commented 3 years ago

Done!

I've kicked off a new japonicus load to check that these get into Chado correctly.

ValWood commented 3 years ago

I see them so we are done here. Some of the regions I expected to see transposons are empty but I will look closer at those regions later. (or somebody else will!). This can close.